by Jeremy Li
The 1000 Genomes Project was completed in 2015 and culminated in its “phase 3 release” of phased genotypes from 2504 individuals at over 80 million variants, and remains the largest fully open-sourced genomic dataset . The final variant callset was based on a combination low-coverage whole-genome sequencing (at an average of around 7x) along with whole exome sequencing and results from genotyping array assays of these individuals.
Earlier this year, the New York Genome Center published a preprint describing their recent resequencing effort of the samples in the 1000 Genomes project in addition to several hundred additional…
by Jeremy Li
Along with Warren Snelling and other colleagues at the USDA MARC (Meat Animal Research Center), we recently performed a study investigating the utility of imputation from low pass sequencing data in cattle in the context of genomic prediction for beef steers. This study is now published in MDPI Genes and can be accessed at the following link . In this blog post, we briefly review the motivations behind this study and outline the main results.
Commercial genotyping arrays are currently routinely used in genomic prediction for beef cattle due to the additional predictive validity that genomic…
by Jeremy Li, Data Scientist at Gencove
At Gencove, we’re continuously expanding the selection of species which can be used with imputation pipelines based off of low-pass sequencing. Unlike, for instance, cattle in the form of the 1000 Bull Genomes Project , there does not currently exist a large-scale public resequencing effort to characterize the genetic diversity of extant breeds used for agricultural purposes, and the existing literature on the performance on genotype imputation in pigs is primarily limited to genotyping array imputation [2,3,4].
To address this current shortcoming, we have recently developed and released a pipeline for low-pass sequencing…
This post is by Jeremy Li, Joe Pickrell, and Tomaz Berisa
At Gencove, we make it a priority to stay on the bleeding edge of genotype imputation from low-pass sequencing data. Until recently, there were few open-source imputation algorithms specifically designed to impute off of sequencing reads rather than called genotypes from genotyping arrays.
Earlier this year, Rubinacci et al. released GLIMPSE, a suite of software tools designed for imputation and phasing off low-pass sequencing data. We were interested in how this new method stacked up against our in-house imputation algorithm, “loimpute”.
We modified the recommended GLIMPSE workflow to allow…
At Gencove, we make an effort to keep current on the most up-to-date developments in the refinement and enhancement of existing imputation reference panels and genomic datasets. To this end, we have released an imputation pipeline for humans based on a liftover from the original 1000 Genomes Phase 3 release to the most recent build of the human reference genome (GRCh38); this is now accessible as a publicly available pipeline configuration.
Although we recently released a beta version of the 1000 Genomes reference panel based on a resequencing effort by the New York Genome Center, it is less well-validated and as such we currently recommend using this liftover release for production purposes instead. Preliminary testing shows that the lifted-over panel provides a ~2.5% increase in genotype concordance to truth data for an imputed European-ancestry individual relative to the ‘natively-called’ panel.
At Gencove, we’re continuously optimizing our analytics around low-pass sequencing on each species that we work with. As part of this optimization, we have released a pipeline for low-pass sequencing in soybean on the latest genome release (Wm82.a4) and a diverse haplotype panel covering the range of soybean genome diversity including over 25M genetic variants. This panel enables users to use low-pass sequencing to accurately profile the entire genome in this species.
Existing Gencove customers can access this pipeline by choosing the configuration ‘Soy low-pass Wm82.a4’; below we describe at a high level the generation and validation of this pipeline.
This post is by Jeremy Li and Joe Pickrell
The accuracy of genotype imputation from low-pass sequencing data depends on two key inputs — a haplotype panel representing the diversity of a species and a statistical model used for imputation with its corresponding assumptions.
The tool we developed for imputation implements a statistical model that assumes recombination is constant across the genome. However, we know that recombination hotspots exist in many species, and so-called “recombination maps”, which catalogue the variation in recombination rates across the genome, have recently become available for a number of species at very fine base-pair resolutions.
The 1000 Genomes Project is currently one of the most widely used genomic datasets, comprising 2504 samples from a diverse range of populations from around the world and deriving from whole-genome sequence at an average of ~7.4x coverage as well as whole exome sequencing. Recently, the New York Genome Center resequenced all 2504 samples to an average coverage of 30x and publicly released preliminary genotype callsets for these on the most recent build of the human reference genome (hg38). We processed these data into a haplotype reference panel, which is now available in beta for use in imputation of human low-pass sequencing data on hg38.
This is a guest post by Zach Fuller and Molly Przeworski at Columbia University on how they used low-pass sequencing in the preprint ‘Population genetics of the coral Acropora millepora: Towards a genomic predictor of bleaching’.
The symbiotic relationship between corals and photosynthetic algae (family Symbiodiniaceae) underpins the evolutionary success of these reef-building organisms and the diverse marine ecosystems they support. Ecological stress, such as that brought on by increased seawater temperatures, can break down this symbiotic relationship in a process known as “bleaching”. Because the algal symbionts provide the majority of the energy required by the coral host, prolonged…
Gencove’s low-pass whole genome sequencing plus capture provides the only available, cost-effective solution for prediction of disease risk and pharmacogenomics response from both common and rare variation.
At Gencove we’ve been actively working on transitioning scientists from using genotyping arrays to sequencing. …