Coverage (genetics)

Ingenetics,coverageis one of several measures of the depth or completeness ofDNA sequencing,and is more specifically expressed in any of the following terms:

Sequence coverage(or depth) is the number of unique reads that include a givennucleotidein the reconstructed sequence.^[1]^[2]Deep sequencingrefers to the general concept of aiming for high number of unique reads of each region of a sequence.^[3]
Physical coverage,the cumulative length of reads or read pairs expressed as a multiple of genome size.^[4]
Genomic coverage,the percentage of allbase pairsorlociof thegenomecovered by sequencing.

Sequence coverage

Rationale

Even though the sequencing accuracy for each individual nucleotide is very high, the very large number of nucleotides in the genome means that if an individual genome is only sequenced once, there will be a significant number of sequencing errors. Furthermore, many positions in a genome contain raresingle-nucleotide polymorphisms(SNPs). Hence to distinguish between sequencing errors and true SNPs, it is necessary to increase the sequencing accuracy even further by sequencing individual genomes a large number of times.

Ultra-deep sequencing

The term "ultra-deep" can sometimes also refer to higher coverage (>100-fold), which allows for detection of sequence variants in mixed populations.^[5]^[6]^[7]In the extreme, error-corrected sequencing approaches such as Maximum-Depth Sequencing can make it so that coverage of a given region approaches the throughput of a sequencing machine, allowing coverages of >10^8.^[8]

Transcriptome sequencing

Deep sequencing oftranscriptomes,also known asRNA-Seq,provides both the sequence and frequency of RNA molecules that are present at any particular time in a specific cell type, tissue or organ.^[9]Counting the number of mRNAs that are encoded by individual genes provides an indicator of protein-coding potential, a major contributor tophenotype.^[10]Improving methods for RNA sequencing is an active area of research both in terms of experimental and computational methods.^[11]

Calculation

The average coverage for awhole genomecan be calculated from the length of the originalgenome(G), the number of reads (N), and the average read length (L) as ${\textstyle N\times L/G}$ .For example, a hypothetical genome with 2,000 base pairs reconstructed from 8 reads with an average length of 500 nucleotides will have 2× redundancy. This parameter also enables one to estimate other quantities, such as the percentage of the genome covered by reads (sometimes also called breadth of coverage). A high coverage in shotgun sequencing is desired because it can overcome errors inbase callingand assembly. The subject ofDNA sequencing theoryaddresses the relationships of such quantities.^[2]

Physical coverage

Sometimes a distinction is made betweensequence coverageandphysical coverage.Where sequence coverage is the average number of times a base is read, physical coverage is the average number of times a base is read or spanned by mate paired reads.^[2]^[12]^[4]

Genomic coverage

In terms of genomic coverage and accuracy,whole genome sequencingcan broadly be classified into either of the following:^[13]

Adraft sequence,covering approximately 90% of the genome at approximately 99.9% accuracy
Afinished sequence,covering more than 95% of the genome at approximately 99.99% accuracy

Producing a truly high-qualityfinishedsequence by this definition is very expensive. Thus, most human "whole genome sequencing"results aredraft sequences(sometimes above and sometimes below the accuracy defined above).^[13]

References

^"Sequencing Coverage".illumina.Illumina education.Retrieved2020-10-08.
^^a ^b ^cSims, David; Sudbery, Ian; Ilott, Nicholas E.; Heger, Andreas; Ponting, Chris P. (2014). "Sequencing depth and coverage: key considerations in genomic analyses".Nature Reviews Genetics.15(2): 121–132.doi:10.1038/nrg3642.PMID 24434847.S2CID 13325739.
^Mardis, Elaine R. (2008-09-01). "Next-Generation DNA Sequencing Methods".Annual Review of Genomics and Human Genetics.9(1): 387–402.doi:10.1146/annurev.genom.9.081307.164359.ISSN 1527-8204.PMID 18576944.
^^a ^bEkblom, Robert; Wolf, Jochen B. W. (2014)."A field guide to whole-genome sequencing, assembly and annotation".Evolutionary Applications.7(9): 1026–42.Bibcode:2014EvApp...7.1026E.doi:10.1111/eva.12178.PMC4231593.PMID 25553065.
^Ajay SS, Parker SC, Abaan HO, Fajardo KV, Margulies EH (September 2011)."Accurate and comprehensive sequencing of personal genomes".Genome Res.21(9): 1498–505.doi:10.1101/gr.123638.111.PMC3166834.PMID 21771779.
^Mirebrahim, Hamid; Close, Timothy J.; Lonardi, Stefano (2015-06-15)."De novo meta-assembly of ultra-deep sequencing data".Bioinformatics.31(12): i9–i16.doi:10.1093/bioinformatics/btv226.ISSN 1367-4803.PMC4765875.PMID 26072514.
^Beerenwinkel, Niko;Zagordi, Osvaldo (2011-11-01). "Ultra-deep sequencing for the analysis of viral populations".Current Opinion in Virology.1(5): 413–418.doi:10.1016/j.coviro.2011.07.008.PMID 22440844.
^Jee, J.; Rasouly, A.; Shamovsky, I.; Akivis, Y.; Steinman, S.; Mishra, B.; Nudler, E. (2016)."Rates and mechanisms of bacterial mutagenesis from maximum-depth sequencing".Nature.534(7609): 693–696.Bibcode:2016Natur.534..693J.doi:10.1038/nature18313.PMC4940094.PMID 27338792.
^Malone, John H.; Oliver, Brian (2011-01-01)."Microarrays, deep sequencing and the true measure of the transcriptome".BMC Biology.9:34.doi:10.1186/1741-7007-9-34.ISSN 1741-7007.PMC3104486.PMID 21627854.
^Hampton M, Melvin RG, Kendall AH, Kirkpatrick BR, Peterson N, Andrews MT (2011)."Deep sequencing the transcriptome reveals seasonal adaptive mechanisms in a hibernating mammal".PLOS ONE.6(10): e27021.Bibcode:2011PLoSO...627021H.doi:10.1371/journal.pone.0027021.PMC3203946.PMID 22046435.
^Heyer EE, Ozadam H, Ricci EP, Cenik C, Moore MJ (2015)."An optimized kit-free method for making strand-specific deep sequencing libraries from RNA fragments".Nucleic Acids Res.43(1): e2.doi:10.1093/nar/gku1235.PMC4288154.PMID 25505164.
^Meyerson, M.; Gabriel, S.; Getz, G. (2010). "Advances in understanding cancer genomes through second-generation sequencing".Nature Reviews Genetics.11(10): 685–696.doi:10.1038/nrg2841.PMID 20847746.S2CID 2544266.
^^a ^bKris A. Wetterstrand, M.S."The Cost of Sequencing a Human Genome".National Human Genome Research Institute.Last updated: November 1, 2021

[1] "Sequencing Coverage".illumina.Illumina education.Retrieved2020-10-08.

[:0-2] Sims, David; Sudbery, Ian; Ilott, Nicholas E.; Heger, Andreas; Ponting, Chris P. (2014). "Sequencing depth and coverage: key considerations in genomic analyses".Nature Reviews Genetics.15(2): 121–132.doi:10.1038/nrg3642.PMID 24434847.S2CID 13325739.

[3] Mardis, Elaine R. (2008-09-01). "Next-Generation DNA Sequencing Methods".Annual Review of Genomics and Human Genetics.9(1): 387–402.doi:10.1146/annurev.genom.9.081307.164359.ISSN 1527-8204.PMID 18576944.

[Ekblom-4] Ekblom, Robert; Wolf, Jochen B. W. (2014)."A field guide to whole-genome sequencing, assembly and annotation".Evolutionary Applications.7(9): 1026–42.Bibcode:2014EvApp...7.1026E.doi:10.1111/eva.12178.PMC4231593.PMID 25553065.

[pmid21771779-5] Ajay SS, Parker SC, Abaan HO, Fajardo KV, Margulies EH (September 2011)."Accurate and comprehensive sequencing of personal genomes".Genome Res.21(9): 1498–505.doi:10.1101/gr.123638.111.PMC3166834.PMID 21771779.

[6] Mirebrahim, Hamid; Close, Timothy J.; Lonardi, Stefano (2015-06-15)."De novo meta-assembly of ultra-deep sequencing data".Bioinformatics.31(12): i9–i16.doi:10.1093/bioinformatics/btv226.ISSN 1367-4803.PMC4765875.PMID 26072514.

[7] Beerenwinkel, Niko;Zagordi, Osvaldo (2011-11-01). "Ultra-deep sequencing for the analysis of viral populations".Current Opinion in Virology.1(5): 413–418.doi:10.1016/j.coviro.2011.07.008.PMID 22440844.

[8] Jee, J.; Rasouly, A.; Shamovsky, I.; Akivis, Y.; Steinman, S.; Mishra, B.; Nudler, E. (2016)."Rates and mechanisms of bacterial mutagenesis from maximum-depth sequencing".Nature.534(7609): 693–696.Bibcode:2016Natur.534..693J.doi:10.1038/nature18313.PMC4940094.PMID 27338792.

[9] Malone, John H.; Oliver, Brian (2011-01-01)."Microarrays, deep sequencing and the true measure of the transcriptome".BMC Biology.9:34.doi:10.1186/1741-7007-9-34.ISSN 1741-7007.PMC3104486.PMID 21627854.

[pmid22046435-10] Hampton M, Melvin RG, Kendall AH, Kirkpatrick BR, Peterson N, Andrews MT (2011)."Deep sequencing the transcriptome reveals seasonal adaptive mechanisms in a hibernating mammal".PLOS ONE.6(10): e27021.Bibcode:2011PLoSO...627021H.doi:10.1371/journal.pone.0027021.PMC3203946.PMID 22046435.

[11] Heyer EE, Ozadam H, Ricci EP, Cenik C, Moore MJ (2015)."An optimized kit-free method for making strand-specific deep sequencing libraries from RNA fragments".Nucleic Acids Res.43(1): e2.doi:10.1093/nar/gku1235.PMC4288154.PMID 25505164.

[MeyersonFig1-12] Meyerson, M.; Gabriel, S.; Getz, G. (2010). "Advances in understanding cancer genomes through second-generation sequencing".Nature Reviews Genetics.11(10): 685–696.doi:10.1038/nrg2841.PMID 20847746.S2CID 2544266.

[Wetterstrand-13] Kris A. Wetterstrand, M.S."The Cost of Sequencing a Human Genome".National Human Genome Research Institute.Last updated: November 1, 2021

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]