The Sequence of the Human Genome
The Sequence of the Human Genome
Journal article
By: John Craig Venter
Date: February 16, 2001
Source: J. C. Venter, et. al. "The Sequence of the Human Genome." Science 291(Feb. 2001): 5507, 1304-51.
About the Author: J. Craig Venter was involved in genomic studies first as researcher at the National Institutes of Health. In 1992, Venter co-founded The Institute for Genomic Research (TIGR). At TIGR, his group sequenced the first bacterial genome of the virus Haemophilus influenzae by a method known as shotgun sequencing, the method that he later used to sequence other genomes. After becoming a president of Celera Genomics in 1998, Venter and his team sequenced the fruit fly, mouse, rat, and human genomes. In 2003, Venter undertook a global expedition to obtain and study microbes from the Earth's varying environments. This work is carried out by the J. Craig Venter Institute, formed in 2004.
INTRODUCTION
The sequencing of the human genome, that is, finding the order of the building blocks of the nucleic acids that make up the entire genetic material of a human, was first proposed in 1985. In 1988, the Human Genome Organization was established to coordinate the international efforts to sequence the genome. While the United States was the largest contributor to the project, other countries such as Japan, Germany, the United Kingdom, France, and China established their own human genome projects, and also contributed to the global sequencing effort.
In 1991, the Human Genome Project was started in the U.S. with the aim of sequencing the entire human genome composed of 2.9 billion base pairs (two building blocks, or nucleotide bases, that connect DNA into a double strand) in fifteen years.
One obvious challenge facing scientists involved in the project was how to sequence (map) the huge number of base pairs and collate all of the sequencing data. In the mid-1980s, small-scale sequencing was routine. However, it was expected that a large project like the human genome sequence would require novel approaches, or it would require much more time than the goal of fifteen years. New techniques that were essential for success were the development of a sequencing method that used fluorescent dyes to tag different bases, high-throughput sequencing, which read DNA sequences about twelve times faster than older technology, and computing tools able to compile the data into one sequence.
The first step in the process of establishing the sequence was to subdivide the genome into manageable pieces, and this was achieved by cloning its fragments. Cloning was performed by inserting the fragments of human DNA into small DNA molecules known as plasmids, or vectors. These molecules are able to multiply in a cell. The inserted fragments were sequenced and their sequences were combined with the data from genetic maps to eventually create a full genome sequence.
A full human genetic map was constructed by using polymorphic markers (gene sequences that have variant forms) that were evenly spaced close together in the genome. The Human Genome Project generated over 10,000 polymorphic markers to create a framework map. Previously cloned and mapped human genes were also placed on the framework.
The Human Genome Organization project was publicly funded and the sequences were made public on a regular basis. At the same time, commercial companies also began sequencing the genome, often taking advantage of publicly available data for use in their genome sequencing projects. Among the commercial competitors to the public human genome project the most prominent was J. C. Venter's Celera Corporation. The Celera approach to the genome sequencing was very different from the map-based public efforts. They proposed to use shotgun sequencing (sequencing of DNA that has been randomly fragmented into pieces) of the genome and subsequently putting it together. This approach was widely criticized and was generally expected to be unsuccessful. The ability to order the large number of sequences in the correct manner was seen as the main obstacle of this approach. However, it was shown to be successful after Celera sequenced the genome of the fruit fly Drosophila melanogaster in 2000 using this method.
The shotgun sequencing method relied on generating high-quality DNA clone libraries, from which all of the inserts were sequenced without the localization of the DNA fragments to the chromosomes at this stage. The sequences were subsequently assembled by combining shorter fragments into longer ones, generating overlapping fragments of sequence, called contigs. The average fragments assembled in that way by Celera were 1000-10,000 base pairs long. These larger contigs were then localized to the chromosomes and eventually formed a full genome sequence.
Venter announced in April 2000 that his group had finished sequencing the human genome during testimony before Congress on the future of the Human Genome Project. After mapping the genome was complete, his article "The Sequence of the Human Genome" was published in Science ten months later.
PRIMARY SOURCE
A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies—a whole-genome assembly and a regional chromosome assembly—were used, each combining sequence data from Celera and the publicly funded genome effort.
… Decoding of the DNA that constitutes the human genome has been widely anticipated for the contribution it will make toward understanding human evolution, the causation of disease, and the interplay between the environment and heredity in defining the human condition. A project with the goal of determining the complete nucleotide sequence of the human genome was first formally proposed in 1985 (1). In subsequent years, the idea met with mixed reactions in the scientific community (2). However, in 1990, the Human Genome Project (HGP) was officially initiated in the United States under the direction of the National Institutes of Health and the U.S. Department of Energy with a 15-year, $3 billion plan for completing the genome sequence. In 1998, we announced our intention to build a unique genome sequencing facility, to determine the sequence of the human genome over a 3-year period. Here we report the penultimate milestone along the path toward that goal, a nearly complete sequence of the euchromatic portion of the human genome. The sequencing was performed by a whole-genome random shotgun method with subsequent assembly of the sequenced segments….
… It has been predicted for the last 15 years that complete sequencing of the human genome would open up new strategies for human biological research and would have a major impact on medicine, and through medicine and public health, on society. Effects on biomedical research are already being felt. This assembly of the human genome sequence is but a first, hesitant step on a long and exciting journey toward understanding the role of the genome in human biology. It has been possible only because of innovations in instrumentation and software that have allowed automation of almost every step of the process from DNA preparation to annotation. The next steps are clear: We must define the complexity that ensues when this relatively modest set of about 30,000 genes is expressed. The sequence provides the framework upon which all the genetics, biochemistry, physiology, and ultimately phenotype depend. It provides the boundaries for scientific inquiry. The sequence is only the first level of understanding of the genome. All genes and their control elements must be identified; their functions, in concert as well as in isolation, defined; their sequence variation worldwide described; and the relation between genome variation and specific phenotypic characteristics determined. Now we know what we have to explain. Another paramount challenge awaits: public discussion of this information and its potential for improvement of personal health. Many diverse sources of data have shown that any two individuals are more than 99.9% identical in sequence, which means that all the glorious differences among individuals in our species that can be attributed to genes falls in a mere 0.1% of the sequence. There are two fallacies to be avoided: determinism, the idea that all characteristics of the person are "hard-wired" by the genome; and reductionism, the view that with complete knowledge of the human genome sequence, it is only a matter of time before our understanding of gene functions and interactions will provide a complete causal description of human variability. The real challenge of human biology, beyond the task of finding out how genes orchestrate the construction and maintenance of the miraculous mechanism of our bodies, will lie ahead as we seek to explain how our minds have come to organize thoughts sufficiently well to investigate our own existence.
SIGNIFICANCE
Along with Venter's group, the human genome sequence was reported at the same time by the International Human Genome Sequencing Consortium in an article in the journal Nature. Both papers described in detail how each group sequenced and analyzed the structure of the genome. Thus, the two papers hailed the finish of the sequencing race between the public and private human genome sequencing groups.
The comparison of the sequences produced by the two groups revealed that there are still quite large stretches of unknown sequence gaps in the assembly of the genomes. The public Human Genome Project sequence had more numerous, but shorter gaps than the Celera genome. Moreover, some known large genes have not been found on a single contig. In addition, analysis of sequences using DNA fragments synthesized in the laboratory revealed that about 0.14 percent of sequences are not shared by the two genomes. In 2003, the Human Genome Project published (and made available on the Internet) their final, more complete version of the human genome, along with a database of variations in the most common gene sequences that distinguish one individual from another.
Now that the first phase of understanding the sequence of human genome is complete, scientists are beginning to learn the function of individual genes. An important aim of research in genetics is to understand how DNA is regulated, especially in relation to disease development. Since the genome was published, scientists have identified errors or mutations in genes that play a major role in the development of complex diseases, such as Parkinson's disease and some prostate and colon cancers. In the future, many diseases caused by gene mutations will be curable, as gene therapy will replace malfunctioning genes with normally functioning versions.
Completion of the human genome sequence also initiated a series of ethical discussions on the use of genetic information. The main areas of concern are the potential use of knowledge to discriminate against a person due to their genetic makeup, and the use of the genome to design genetically superior babies.
The international collaborative effort that resulted in sequencing the human genome provided a fundamental base for a biomedicine in the twenty-first century. Additionally, that fundamental base has yielded clues about the evolution of humans as a species. Scientists studying the 2003 human genome sequence identified about 1000 new genes that became part of the human genome when humans diverged from smaller mammals, about 75 million years ago.
FURTHER RESOURCES
Books
Dennis, Carina, and Richard Gallagher, eds. The Human Genome. New York: Palgrave Macmillan, 2002.
Periodicals
International Human Genome Sequencing Consortium. "Initial Sequencing and Analysis of the Human Genome." Nature, vol. 409 (2001): 860-921.
Web sites
Nature. "Genome Gateway." 〈http://www.nature.com/genomics/human/index.html〉 (accessed September 15, 2005).
Science. "Functional Genomics: The Human Genome." 〈http://www.sciencemag.org/feature/plus/sfg/human/index.shtml〉 (accessed September 15, 2005).