Jump to content

Genome project

From Wikipedia, the free encyclopedia
When printed, the human genome sequence fills around 100 huge books of close print

Genome projectsarescientificendeavours that ultimately aim to determine the completegenomesequence of anorganism(be it ananimal,aplant,afungus,abacterium,anarchaean,aprotistor avirus) and to annotate protein-codinggenesand other important genome-encoded features.[1]The genome sequence of an organism includes the collectiveDNAsequences of eachchromosomein the organism. For abacteriumcontaining a single chromosome, a genome project will aim to map the sequence of that chromosome. For the human species, whose genome includes 22 pairs ofautosomesand 2 sex chromosomes, a complete genome sequence will involve 46 separate chromosome sequences.

TheHuman Genome Projectis a well known example of a genome project.[2]

Genome assembly

[edit]

Genome assembly refers to the process of taking a large number of shortDNA sequencesand reassembling them to create a representation of the originalchromosomesfrom which the DNA originated. In ashotgun sequencingproject, all the DNA from a source (usually a singleorganism,anything from abacteriumto amammal) is first fractured into millions of small pieces. These pieces are then "read" by automated sequencing machines. A genome assemblyalgorithmworks by taking all the pieces and aligning them to one another, and detecting all places where two of the short sequences, orreads,overlap. These overlapping reads can be merged, and the process continues.

Genome assembly is a very difficultcomputationalproblem, made more difficult because many genomes contain large numbers of identical sequences, known asrepeats.These repeats can be thousands of nucleotides long, and occur different locations, especially in the large genomes ofplantsandanimals.

The resulting (draft) genome sequence is produced by combining the information sequencedcontigsand then employing linking information to create scaffolds. Scaffolds are positioned along thephysical mapof the chromosomes creating a "golden path".

Assembly software

[edit]

Originally, most large-scale DNA sequencing centers developed their own software for assembling the sequences that they produced. However, this has changed as the software has grown more complex and as the number of sequencing centers has increased. An example of suchassemblerShort Oligonucleotide Analysis Packagedeveloped byBGIfor de novo assembly of human-sized genomes, alignment,SNPdetection, resequencing, indel finding, and structural variation analysis.[3][4][5]

Genome annotation

[edit]

Since the 1980s,molecular biologyandbioinformaticshave created the need forDNA annotation.DNA annotation or genome annotation is the process of identifying attaching biological information tosequences,and particularly in identifying the locations of genes and determining what those genes do.

Time of completion

[edit]

Whensequencinga genome, there are usually regions that are difficult to sequence (often regions with highlyrepetitive DNA). Thus, 'completed' genome sequences are rarely ever complete, and terms such as 'working draft' or 'essentially complete' have been used to more accurately describe the status of such genome projects. Even when everybase pairof a genome sequence has been determined, there are still likely to be errors present because DNA sequencing is not a completely accurate process. It could also be argued that a complete genome project should include the sequences ofmitochondriaand (for plants)chloroplastsas theseorganelleshave their own genomes.

It is often reported that the goal of sequencing a genome is to obtain information about the complete set ofgenesin that particular genome sequence. The proportion of a genome that encodes for genes may be very small (particularly ineukaryotessuch as humans, wherecoding DNAmay only account for a few percent of the entire sequence). However, it is not always possible (or desirable) to only sequence thecoding regionsseparately. Also, as scientists understand more about the role of thisnoncoding DNA(often referred to asjunk DNA), it will become more important to have a complete genome sequence as a background to understanding the genetics and biology of any given organism.

In many ways genome projects do not confine themselves to only determining a DNA sequence of an organism. Such projects may also includegene predictionto find out where the genes are in a genome, and what those genes do. There may also be related projects to sequenceESTsormRNAsto help find out where the genes actually are.

Historical and technological perspectives

[edit]

Historically, when sequencing eukaryotic genomes (such as the wormCaenorhabditis elegans) it was common to firstmapthe genome to provide a series of landmarks across the genome. Rather than sequence a chromosome in one go, it would be sequenced piece by piece (with the prior knowledge of approximately where that piece is located on the larger chromosome). Changes in technology and in particular improvements to the processing power of computers, means that genomes can now be 'shotgun sequenced' in one go (there are caveats to this approach though when compared to the traditional approach).

Improvements inDNA sequencingtechnology have meant that the cost of sequencing a new genome sequence has steadily fallen (in terms of cost perbase pair) and newer technology has also meant that genomes can be sequenced far more quickly.

When research agencies decide what new genomes to sequence, the emphasis has been on species which are either high importance asmodel organismor have a relevance to human health (e.g. pathogenicbacteriaorvectorsof disease such asmosquitos) or species which have commercial importance (e.g. livestock and crop plants). Secondary emphasis is placed on species whose genomes will help answer important questions inmolecular evolution(e.g. thecommon chimpanzee).

In the future, it is likely that it will become even cheaper and quicker to sequence a genome. This will allow for complete genome sequences to be determined from many different individuals of the same species. For humans, this will allow us to better understand aspects ofhuman genetic diversity.

Examples

[edit]
L1 Dominette 01449, the Hereford who serves as the subject of theBovine Genome Project
The Giant Sequoia genome sequence was extracted from a single fertilized seed harvested from a 1,360-year-old tree inSequoia/Kings Canyon National Park.

Many organisms have genome projects that have either been completed or will be completed shortly, including:

See also

[edit]

References

[edit]
  1. ^Pevsner, Jonathan (2009).Bioinformatics and functional genomics(2nd ed.). Hoboken, N.J: Wiley-Blackwell.ISBN9780470085851.
  2. ^"Potential Benefits of Human Genome Project Research".Department of Energy,Human Genome Project Information. 2009-10-09. Archived fromthe originalon 2013-07-08.Retrieved2010-06-18.
  3. ^Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G, Kristiansen K, Li S, Yang H, Wang J, Wang J (February 2010)."De novo assembly of human genomes with massively parallel short read sequencing".Genome Research.20(2): 265–272.doi:10.1101/gr.097261.109.ISSN1549-5469.PMC2813482.PMID20019144.
  4. ^abRasmussen M, Li Y, Lindgreen S, Pedersen JS, Albrechtsen A, Moltke I, Metspalu M, Metspalu E, Kivisild T, Gupta R, Bertalan M, Nielsen K, Gilbert MT, Wang Y, Raghavan M, Campos PF, Kamp HM, Wilson AS, Gledhill A, Tridico S, Bunce M, Lorenzen ED, Binladen J, Guo X, Zhao J, Zhang X, Zhang H, Li Z, Chen M, Orlando L, Kristiansen K, Bak M, Tommerup N, Bendixen C, Pierre TL, Grønnow B, Meldgaard M, Andreasen C, Fedorova SA, Osipova LP, Higham TF, Ramsey CB, Hansen TV, Nielsen FC, Crawford MH, Brunak S, Sicheritz-Pontén T, Villems R, Nielsen R, Krogh A, Wang J, Willerslev E (2010-02-11)."Ancient human genome sequence of an extinct Palaeo-Eskimo".Nature.463(7282): 757–762.Bibcode:2010Natur.463..757R.doi:10.1038/nature08835.ISSN1476-4687.PMC3951495.PMID20148029.
  5. ^Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, Li J, Zhang J, Guo Y, Feng B, Li H, Lu Y, Fang X, Liang H, Du Z, Li D, Zhao Y, Hu Y, Yang Z, Zheng H, Hellmann I, Inouye M, Pool J, Yi X, Zhao J, Duan J, Zhou Y, Qin J, Ma L, Li G, Yang Z, Zhang G, Yang B, Yu C, Liang F, Li W, Li S, Li D, Ni P, Ruan J, Li Q, Zhu H, Liu D, Lu Z, Li N, Guo G, Zhang J, Ye J, Fang L, Hao Q, Chen Q, Liang Y, Su Y, San A, Ping C, Yang S, Chen F, Li L, Zhou K, Zheng H, Ren Y, Yang L, Gao Y, Yang G, Li Z, Feng X, Kristiansen K, Wong GK, Nielsen R, Durbin R, Bolund L, Zhang X, Li S, Yang H, Wang J (2008-11-06)."The diploid genome sequence of an Asian individual".Nature.456(7218): 60–65.Bibcode:2008Natur.456...60W.doi:10.1038/nature07484.ISSN0028-0836.PMC2716080.PMID18987735.
  6. ^Ghosh, Pallab (23 April 2015)."Mammoth genome sequence completed".BBC News.
  7. ^Yates, Diana (2009-04-23)."What makes a cow a cow? Genome sequence sheds light on ruminant evolution"(Press Release).EurekAlert!.Retrieved2012-12-22.
  8. ^Elsik, C. G.; Elsik, R. L.; Tellam, K. C.; Worley, R. A.; Gibbs, D. M.; Muzny, G. M.; Weinstock, D. L.; Adelson, E. E.; Eichler, L.; Elnitski, R.; Guigó, D. L.; Hamernik, S. M.; Kappes, H. A.; Lewin, D. J.; Lynn, F. W.; Nicholas, A.; Reymond, M.; Rijnkels, L. C.; Skow, E. M.; Zdobnov, L.; Schook, J.; Womack, T.; Alioto, S. E.; Antonarakis, A.; Astashyn, C. E.; Chapple, H. -C.; Chen, J.; Chrast, F.; Câmara, O.; et al. (2009)."The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution".Science.324(5926): 522–528.Bibcode:2009Sci...324..522A.doi:10.1126/science.1169588.PMC2943200.PMID19390049.
  9. ^"2007 Release: Horse Genome Assembled".National Human Genome Research Institute (NHGRI).Retrieved19 April2018.
  10. ^Scott, Alison D; Zimin, Aleksey V; Puiu, Daniela; Workman, Rachael; Britton, Monica; Zaman, Sumaira; Caballero, Madison; Read, Andrew C; Bogdanove, Adam J; Burns, Emily; Wegrzyn, Jill; Timp, Winston; Salzberg, Steven L; Neale, David B (November 1, 2020)."A Reference Genome Sequence for Giant Sequoia".G3: Genes, Genomes, Genetics.10(11): 3907–3919.doi:10.1534/g3.120.401612.PMC7642918.PMID32948606.
[edit]