Genetic Code
Genetic Code
The sequence of nucleotides in DNA determines the sequence of amino acids found in all proteins. Since there are only four nucleotide "letters" in the DNA alphabet (A, C, G, T, which stand for adenine, cytosine, guanine, and thymine), but there are 20 different amino acids in the protein alphabet, it is clear that more than one nucleotide must be used to specify an amino acid. Even two nucleotides read at a time would not give sufficient combinations (4 × 4 = 16) to encode all 20 amino acids plus start and stop signals. Therefore it would require a minimum of three DNA nucleotides to "spell out" one amino acid, and indeed this is the number that is actually used. RNA also uses a four letter alphabet when it reads and transcribes DNA instructions during protein synthesis, but its set of nucleotides is somewhat different, substituting U (uracil) for T (thymine).
Any single set of three nucleotides is called a codon , and the set of all possible three-nucleotide combinations is called "the genetic code" or "triplet code." There are sixty-four different combinations or codons (4 × 4 × 4 = 64). We now know that three codons (UAA, UAG, and UGA) specify a "stop" signal, indicating the termination of the polypeptide chain being synthesized on the ribosome. Each of the remaining sixty-one codons encodes an amino acid. The "start" signal is the codon AUG, which also encodes the amino acid methionine. The codons are read from the messenger RNA molecule during protein synthesis, and, consequently, they are given in RNA bases rather than in the original DNA sequence. The reading of the codons is shown in Figure 1.
Translation
The gene is represented by the sequences of bases in the DNA molecule, which can, in a sense, be thought of as a "storage molecule" for genetic information. DNA is extremely stable, a property critical to the maintenance of the integrity of the gene. This stability is evidenced by the fact that DNA has been extracted from Egyptian mummies and extinct animals such as the woolly mammoth. It can be extracted from dried blood or from a single hair at a crime scene.
Each cell contains a complete set of genes, but only certain of these genes are active or "expressed" at any one time. When a gene is active, a "disposable" copy is transcribed from the gene into codons contained in a messenger RNA (mRNA) molecule. Unlike the DNA molecule, the mRNA molecule is relatively unstable and short-lived. This is so that when a gene is turned off, the mRNA does not remain in the cell forever, running off more proteins on the ribosomes that are no longer needed by the cell.
Another RNA molecule, called transfer RNA (tRNA), contains a specific region called the anticodon. The tRNA anticodon can base pair with the codon region of the mRNA during protein synthesis, using the base pairing rules of A-U, U-A, C-G, and G-C. Each tRNA carries a specific amino acid. Thus the tRNA carrying methionine has a UAC anticodon that pairs with the AUG codon of the mRNA bound to the ribosome. Similarly the tRNA for proline has a GGA anticodon.
In examining the table of codons (Table 1) you will see that there is more than one codon for each amino acid, except for methionine (AUG) and tryptophan (UGG). Different codons that code for the same amino acid are said to be "synonyms," and the code is said to be "degenerate" in the sense that there is not a single, unique codon for each of the twenty amino acids.
The "Wobble" Hypothesis
Even before the genetic code had been elucidated, Francis Crick postulated that base pairing of the mRNA codons with the tRNA anticodons would require precision in the first two nucleotide positions but not so in the third position (the precise conformation of base pairs , which refers to the hydrogen bonding between A-T (A-U in RNA) and C-G pairs is known as Watson-Crick base pairing). The third position, in general, would need to be only a purine (A or G) or a pyrimidine (C or U). Crick called this phenomenon "wobble."
This less-than-precise base pairing would require fewer tRNA species. For example, tRNAGlu could pair with either GAA or GAG codons. In looking at the codon table, one can see that, for the most part, the first two letters are important to specify the particular amino acid. The only exceptions are AUG (Met) and UGG (Trp) which, as indicated above, have only one codon each.
The Code Has No Gaps or Overlaps
The 1960s were an exciting time for molecular biologists, for it was then that the genetic code was broken. Two possibilities had to be considered for the genetic code. It was possible that the code had gaps, that is, some sort of punctuation mark or a "spacer" nucleotide or nucleotides between coding groups. Second, the code could be either overlapping or nonoverlapping. These possibilities are illustrated in Figures 2 and 3. An overlapping code would have the advantage that more information could be contained in a smaller space.
However, in overlapping code a mutation that changed one base would lead to the changing of three consecutive amino acids in the protein sequence. Genetic evidence, available even before the code had been deciphered, indicated that a single point mutation, that is, a change in a single nucleotide, affected only one amino acid and thus suggested a nonoverlapping code.
Another possibility was that the code had punctuation marks, that is, a base (indicated by "p" in Figure 3) acting as a comma that would separate each codon. In this situation, if an additional base were inserted into a codon, then only that codon would be affected. In a code without punctuations or gaps, however, insertion of a single nucleotide would result in all codons from that point on being affected. This would in turn change the amino acid sequence in the protein from that point on. Again, genetic evidence ruled out a punctuated code, as base insertions do, in fact, affect the entire protein from the insertion point on, rather than just a single amino acid. This effect is called a frameshift mutation.
In the late 1970s DNA sequencing techniques were developed. A number of proteins had already been sequenced by protein sequencing methods. When the genes for these proteins were cloned and sequenced, the predicted protein sequence could be deduced. Agreement between the DNA and protein sequences confirmed the accuracy of the genetic code.
Exceptions to the Universal Genetic Code
After the original genetic code of E. coli was completed in 1968, the genetic code was subsequently determined for many other organisms ranging from bacteria to mammals, including humans. The codons were found to be the same for all organisms, leading to the idea that the genetic code is "universal." Furthermore, it also suggested that life on Earth had a single evolutionary origin, otherwise there would have been numerous genetic codes. The code was established during evolution, probably by chance, as there are no compelling reasons one codon should prevail over another. After it was established, any subsequent changes in the code would prove to be lethal, for if one codon changed, then all similar codons in the entire organism's genome would have to change simultaneously—a highly unlikely possibility.
Thus, it was surprising to find that there are, in fact, a few rare exceptions to the universal code. These exceptions are listed in Table 2. Most of these exceptions are found in the mitochondrial genome. The mitochondrion is thought to have evolved from an endosymbiotic bacterium at the time when the eukaryotic cell first arose. The mitochondrial genome is small, and most of the genes of the original endosymbiont have migrated to the nucleus.
EXCEPTIONS TO THE UNIVERSAL GENETIC CODE | |||
Organism | Normal codon | Usual meaning | New meaning |
Mammalian | AGA, AGG | Arginine | Stop codon |
mitochondria | AUA | Isoleucine | Methionine |
UGA | Stop codon | Tryptophan | |
Drosophila | AGA, AGG | Arginine | Serine |
mitochondria | AUA | Isoleucine | Methionine |
UGA | Stop codon | Tryptophan | |
Yeast | AUA | Isoleucine | Methionine |
mitochondria | UGA | Stop codon | Tryptophan |
CUA, CUC, CUG, CUU | Leucine | Threonine | |
Higher plant | UGA | Stop codon | Tryptophan |
mitochondria | CGG | Arginine | Tryptophan |
Protozoan nuclei | UAA, UAG | Stop codons | Glutamine |
Mycoplasma capricolum bacteria | UGA | Stop codon | Tryptophan |
In examining the exceptions to the universal genetic code in Table 2, you can see that there are only a few changes, most notably the use of a standard "stop" codon to encode an amino acid. For example, UGA normally is a stop codon. But in the mitochondria of the fruit fly Drosophila melanogaster, it encodes the amino acid tryptophan.
A few additional exceptions to the universal genetic code have also been identified. These include the nuclear genome of a few protozoan species and also in the bacterium Mycoplasma capricolum. These exceptions, however, do not imply multiple evolutionary origins of life. What is most striking is that the "exceptional" meanings of most of the codons are identical across all the organisms in which they are found, not different. Had there been multiple origins, we would expect to see drastically different genetic codes in these exceptional organisms.
see also Crick, Francis; Escherichia coli (E. coli bacterium); Nucleotide; Reading Frame; Ribosome; Transcription; Translation.
Ralph R. Meyer
Bibliography
"The Genetic Code." Cold Spring Harbor Symposia on Quantitative Biology, vol. 31. Cold Spring Harbor, NY: Cold Spring Harbor Press, 1966.
Kay, Lily E. Who Wrote the Book of Life? A History of the Genetic Code. Stanford, CA: Stanford University Press, 2000.
Nirenberg, M. W., and J. H. Matthaei. "The Dependence of Cell-Free Protein Synthesis in E. coli upon Naturally Occurring or Synthetic Polyribonucleotides." Proceedings of the National Academy of Sciences 47 (1961): 1588-1602.
Genetic Code
Genetic Code
The genetic code allows an organism to translate the genetic information found in its chromosomes into usable proteins . Stretches of deoxyribonucleic acid (DNA) are built from four different nucleotide bases, while proteins are made from twenty unique subunits called amino acids . This numerical disparity presents an interesting problem: How does the cell translate the genetic information in the four-letter alphabet of DNA into the twenty-letter alphabet of protein? The conversion code is called the genetic code.
Requirements of a Code
The information transfer from DNA to protein, called gene expression , occurs in two steps. In the first step, called transcription , a DNA sequence is copied to make a template for protein synthesis called messenger ribonucleic acid (messenger RNA, or mRNA). During protein synthesis, ribosomes and transfer RNA (tRNA) use the genetic code to convert genetic information contained in mRNA into functional protein. (Formally speaking, the genetic code refers to the RNA-amino acid conversion code and not to DNA, though usage has expanded to refer more broadly to DNA.)
Mathematics reveals the minimum requirements for a genetic code. The ribosome must convert mRNA sequences that are written in four bases—A, G, U, and C—into proteins, which are made up of twenty different amino acids. A one base to one amino acid correspondence would code for only four amino acids (41). Similarly, all combinations of a two-base code (for example, AA, AU, AG, AC, etc.) will provide for only sixteen amino acids (42). However, blocks of three RNA bases allow sixty-four (43) combinations of the four nucleotides, which is more than enough combinations to correspond to the twenty distinct amino acids. So, the genetic code must use blocks of at least three RNA bases to specify each amino acid. (This reasoning assumes that each amino acid is encoded by the same size block of RNA.)
In addition, a ribosome must know where to start synthesizing a protein on an mRNA molecule and where to stop, and start and stop signals require their own RNA sequences. A series of experiments carried out in the 1960s confirmed these mathematical speculations, and went on to determine which triplet sequence (called a codon ) specifies which amino acid.
Indeed, the genetic code uses codons of three bases each, such as ACC or CUG. Therefore, the protein synthesis machinery reads every triplet of bases along the mRNA and builds a chain of amino acids—a protein—accordingly. Reading triplets, however, would allow a ribosome to start at any one of three positions within a given triplet (see Fig. 1). The position that the ribosome chooses is based on the location of the start signal and is called the "reading frame."
Experiments have shown all but three of the sixty-four possible codons that A, G, U, and C specify code for one amino acid each. This means that most amino acids are encoded by more than one codon. In other words, the genetic code is said to be redundant or degenerate. This redundancy allows the protein-synthesizing machinery of the cell to get by with less, as will be seen below. The three that don't, the "nonsense" codons, indicate the end of the protein-coding region of an mRNA, and are termed stop codons.
Starting, Stopping, and Making Protein
In any mRNA molecule, one codon always marks the beginning of a protein. That "start" codon is usually AUG in both eukaryotes and prokaryotes , although eukaryotes use GUG on rare occasions. AUG codes for the amino acid methionine. To start synthesizing at an AUG, however, ribosomes require more information besides a start codon; this information is found in the sequence surrounding the initial AUG. AUG codons in the middle of a protein-coding sequence are translated like any other codon.
Three codons signal the end of the mRNA template. These so-called stop codons, UAA, UAG, and UGA, do not code for any amino acid. Instead, the ribosome gets stuck, waiting for the tRNA that never comes, and eventually falls off, releasing the newly synthesized protein.
The complementary sequence of a codon found on a tRNA molecule is called its anticodon. The tRNA molecule matches up its anticodon with the correct codon on the mRNA. A tRNA molecule holds an amino acid in one of its molecular arms and works with the ribosome to add its amino acid to the protein being synthesized. Each tRNA is then reloaded with its specific amino acid by an enzyme in the cytosol .
Codons that code for the same amino acid are called redundant codons. The first two bases of redundant codons are usually the same and the third is either U or C, or alternatively A or G. For example, two redundant codons for the amino acid arginine are CGU and CGC, both of which pair with the same tRNA, despite having different third bases. This characteristic of the codon-anticodon interaction is called "wobble," and it allows organisms to have fewer than sixty-four distinct tRNA genes. In some tRNAs, wobble is made possible by a modified base within the anticodon. This modified base is called inosine (designated by I) and is made from adenine.
Evidence for Evolution
For almost all organisms tested, including humans, flies, yeast, and bacteria, the same codons are used to code for the same amino acids. Therefore, the genetic code is said to be universal. The universality of the genetic code strongly implies a common evolutionary origin to all organisms, even those in which the small differences have evolved. These include a few bacteria and protozoa that have a few variations, usually involving stop codons.
Mammalian mitochondria , which contain DNA, use the codon UGA not as a stop signal but instead to specify the amino acid tryptophan, and they have four stop codons instead of three. Also, the modified base inosine is not used in mitochondrial anticodons. Mitochondrial genetic codes from different organisms can also be distinct from each other as well as from the universal code, reflecting both their ancient bacterial origins and their long isolation within their host species.
see also Archaea; Cell Evolution; DNA; Eubacteria; Gene; Mitochondrion; Nucleotides; Protein Synthesis; Protista; Ribosome; Transcription
Mary Beckman
Bibliography
Alberts, Bruce, et al. Molecular Biology of the Cell, 4th ed. New York: Garland Publishing, 2000.
Creighton, Thomas E. Proteins: Structures and Molecular Properties, 2nd ed. New York: W. H. Freeman, 1993.
Freifelder, David. Molecular Biology, 2nd ed. Boston: Jones & Bartlett, 1987.
Lehninger, Albert L. Principles of Biochemistry, New York: Worth Publishers, 1982.
Genetic Code
Genetic Code
Although the genetic code is not a "code" in the sense normally used in intelligence and espionage terminology, a fundamental understanding of the genetic code is essential to understanding the molecular basis of advanced DNA and genetic tests that are increasingly important in forensic science and identification technology.
The genetic information that is passed on from parent to offspring is carried by the DNA of a cell. The genes on the DNA code for specific proteins that determine appearance, different facets of personality, health etc. In order for the genes to produce the proteins, it must first be transcribed from DNA to RNA in a process known as transcription. Thus, transcription is defined as the transfer of genetic information from the DNA to the RNA. Translation is the process in which genetic information, carried by messenger RNA (mRNA), directs the synthesis of proteins from amino acids, whereby the primary structure of the protein is determined by the nucleotide sequence in the mRNA.
The genetic code is the set of correspondences between the nucleotide sequences of nucleic acids such as deoxyribonucleic acid (DNA), and the amino acid sequences of proteins (polypeptides). These correspondences enable the information encoded in the chemical components of DNA to be transferred to the ribonucleic acid messenger (mRNA) and then used to establish the correct sequence of amino acids in the polypeptide. The elements of the encoding system, the nucleotides, differ by only four different bases. These are known as adenine (A), guanine,(G), thymine (T) and cytosine (C), in DNA or uracil (U) in RNA. Thus RNA contains U in the place of C and the nucleotide sequence of DNA acts as a template for the synthesis of a complementary sequence of RNA, a process known as transcription. For historical reasons, the term genetic code in fact refers specifically to the sequence of nucleotides in mRNA, although today it is sometimes used interchangeably with the coded information in DNA.
Proteins found in nature consist of 20 naturally occurring amino acids. One important question is, how can four nucleotides code for 20 amino acids? This question was raised by scientists in the 1950s soon after the discovery that the DNA comprised the hereditary material of living organisms. It was reasoned that if a single nucleotide coded for one amino acid, then only four amino acids could be provided for. Alternatively, if two nucleotides specified one amino acid, then there could be a maximum number of 16 (42) possible arrangements. If, however, three nucleotides coded for one amino acid, then there would be 64 (43) possible permutations, more than enough to account for all the 20 naturally occurring amino acids. The latter suggestion was proposed by the Russian born physicist, George Gamow (1904–1968) and was later proved to be correct. It is now well known that every amino acid is coded by at least one nucleotide triplet or codon, and that some triplet combinations function as instructions for the termination or initiation of translation. Three combinations in tRNA, UAA, UGA and UAG, are termination codons, while AUG is a translation start codon.
The genetic code was solved between 1961 and 1963. The American scientist Marshall Nirenberg (1927–), working with his colleague Heinrich Matthaei, made the first breakthrough when they discovered how to make synthetic mRNA. They found that if the nucleotides of RNA carrying the four bases A, G, C and U, were mixed in the presence of the enzyme polynucleotide phosphorylase, a single stranded RNA was formed in the reaction, with the nucleotides being incorporated at random. This offered the possibility of creating specific mRNA sequences and then seeing which amino acids they would specify. The first synthetic mRNA polymer obtained contained only uracil (U) and when mixed in vitro with the protein synthesizing machinery of Escherichia coli it produced a polyphenylalanine—a string of phenylalanine. From this it was concluded that the triplet UUU coded for phenylalanine. Similarly, a pure cytosine (C) RNA polymer produced only the amino acid proline, so the corresponding codon for cytosine had to be CCC. This type of analysis was refined when nucleotides were mixed in different proportions in the synthetic mRNA and a statistical analysis was used to determine the amino acids produced. It was quickly found that a particular amino acid could be specified by more than one codon. Thus, the amino acid serine could be produced from any one of the combinations UCU, UCC, UCA, or UCG. In this way the genetic code is said to be degenerate, meaning that each of the 64 possible triplets
have some meaning within the code and that several codons may encode a single amino acid.
This work confirmed the ideas of the British scientists Francis Crick (1916–) and Sydney Brenner (1927–). Brenner and Crick were working with mutations in the bacterial virus bacteriophage T4 and found that the deletion of a single nucleotide could abolish the function of a specific gene. However, a second mutation in which a nucleotide was inserted at a different, but nearby position, restored the function of that gene. These two mutations are said to be suppressors of each other, meaning that they cancel each other's mutant properties. It was concluded from this that the genetic code was read in a sequential manner starting from a fixed point in the gene. The insertion or deletion of a nucleotide shifted the reading frame in which succeeding nucleotides were read as codons, and was thus termed a frameshift mutation. It was also found that whereas two closely spaced deletions, or two closely spaced insertions, could not suppress each other, three closely spaced deletions or insertions could do so. Consequently, these observations established the triplet nature of the genetic code. The reading frame of a sequence is the way in which the sequence is divided into the triplets and is determined by the precise point at which translation is initiated. For example, the sequence CATCATCAT can be read CAT CAT CAT or C ATC ATC AT or CA TCA TCA T in the three possible reading frames. Sometimes, as in particular bacterial viruses, genes have been found that are contained within other genes. These are translated in different reading frames so the amino acid sequences of the proteins encoded by them are different. Such economy of genetic material is, however, quite rare.
The same genetic code appears to operate in all living things, but exceptions to this universality are known. In human mitochondrial mRNA, AGA and AGG are termination or stop codons. Other differences also exist in the correspondences between certain codon sequences and amino acids.
█ FURTHER READING:
BOOKS:
Brenner, Sydney. My Life in Science. London: BioMed Central, Ltd., 2001.
Davies, Kevin. Cracking The Genome: Inside The Race To Unlock Human DNA. New York: Free Press, 2001.
Watson, James D. The Double Helix: A Personal Account of the Discovery of the Structure of DNA. Westport, CT: Touchstone Books, 2001.
——. DNA: The Secret of Life. New York: Knopf, 2003.
SEE ALSO
DNA Fingerprinting
Forensic Science
Genetic Information: Ethics, Privacy, and Security Issues
Genetic Technology
Genomics
Genetic Code
Genetic code
The genetic code is the set of correspondences between the nucleotide sequences of nucleic acids such as DNA (deoxyribonucleic acid ), and the amino acid sequences of polypeptides. These correspondences enable the information encoded in the chemical components of the DNA to be transferred to the ribonucleic acid messenger (mRNA), and then to be used to establish the correct sequence of amino acids in the polypeptide. The elements of the encoding system, the nucleotides, differ by only four different bases. These are known as adenine (A), guanine, (G), thymine (T) and cytosine (C), in DNA or uracil (U) in RNA . Thus, RNA contains U in the place of C and the nucleotide sequence of DNA acts as a template for the synthesis of a complementary sequence of RNA, a process known as transcription . For historical reasons, the term genetic code in fact refers specifically to the sequence of nucleotides in mRNA, although today it is sometimes used interchangeably with the coded information in DNA.
Proteins found in nature consist of 20 naturally occurring amino acids. One important question is, how can four nucleotides code for 20 amino acids? This question was raised by scientists in the 1950s soon after the discovery that the DNA comprised the hereditary material of living organisms. It was reasoned that if a single nucleotide coded for one amino acid, then only four amino acids could be provided for. Alternatively, if two nucleotides specified one amino acid, then there could be a maximum number of 16 (42) possible arrangements. If, however, three nucleotides coded for one amino acid, then there would be 64 (43) possible permutations, more than enough to account for all the 20 naturally occurring amino acids. The latter suggestion was proposed by the Russian born physicist, George Gamow (1904–1968) and was later proved correct. It is now well known that every amino acid is coded by at least one nucleotide triplet or codon, and that some triplet combinations function as instructions for the termination or initiation of translation . Three combinations in tRNA, UAA, UGA and UAG, are termination codons, while AUG is a translation start codon.
The genetic code was solved between 1961 and 1963. The American scientist Marshall Nirenberg (1927– ), working with his colleague Heinrich Matthaei, made the first breakthrough when they discovered how to make synthetic mRNA. They found that if the nucleotides of RNA carrying the four bases A, G, C and U, were mixed in the presence of the enzyme polynucleotide phosphorylase, a single stranded RNA was formed in the reaction, with the nucleotides being incorporated at random. This offered the possibility of creating specific mRNA sequences and then seeing which amino acids they would specify. The first synthetic mRNA polymer obtained contained only uracil (U) and when mixed in vitro with the protein synthesizing machinery of Escherichia coli it produced a polyphenylalanine—a string of phenylalanine. From this it was concluded that the triplet UUU coded for phenylalanine. Similarly, a pure cytosine (C) RNA polymer produced only the amino acid proline so the corresponding codon for cytosine had to be CCC. This type of analysis was refined when nucleotides were mixed in different proportions in the synthetic mRNA and a statistical analysis was used to determine the amino acids produced. It was quickly found that a particular amino acid could be specified by more than one codon. Thus, the amino acid serine could be produced from any one of the combinations UCU, UCC, UCA, or UCG. In this way the genetic code is said to be degenerate, meaning that each of the 64 possible triplets have some meaning within the code and that several codons may encode a single amino acid.
This work confirmed the ideas of the British scientists Francis Crick (1916– ) and Sidney Brenner (1927– ). Brenner and Crick were working with mutations in the bacterial virus bactriophage T4 and found that the deletion of a single nucleotide could abolish the function of a specific gene . However, a second mutation in which a nucleotide was inserted at a different, but nearby position restored the function of that gene. These two mutations are said to be suppressors of each other, meaning that they cancel each other's mutant properties. It was concluded from this that the genetic code was read in a sequential manner starting from a fixed point in the gene. The insertion or deletion of a nucleotide shifted the reading frame in which succeeding nucleotides were read as codons, and was thus termed a frameshift mutation. It was also found that whereas two closely spaced deletions, or two closely spaced insertions, could not suppress each other, three closely spaced deletions or insertions could do so. Consequently, these observations established the triplet nature of the genetic code. The reading frame of a sequence is the way in which the sequence is divided into the triplets and is determined by the precise point at which translation is initiated. For example, the sequence CATCATCAT can be read CAT CAT CAT or C ATC ATC AT or CA TCA TCA T in the three possible reading frames. Sometimes, as in particular bacterial viruses , genes have been found that are contained within other genes. These are translated in different reading frames so the amino acid sequences of the proteins encoded by them are different. Such economy of genetic material is, however, quite rare
The same genetic code appears to operate in all living things, but exceptions to this universality are known. In human mitochondrial mRNA, AGA and AGG are termination or stop codons. Other differences also exist in the correspondences between certain codon sequences and amino acids. In ciliates, there are also unusual features in that UAA and UAG code for glutamine (CAA and CAG in other eukaryotes ) and the only termination codon appears to be UGA.
See also Bacteriophage and bacteriophage typing; Gene amplification; Genetic identification of microorganisms; Genetic mapping; Genetic regulation of eukaryotic cells; Genetic regulation of prokaryotic cells; Genotype and phenotype; Immunogenetics
Genetic Code
Genetic Code
Some forensic identification techniques that detect living organisms or their products (e.g., toxins ) rely on the detection of genetic sequences within the organism's genetic material. These tests can be exquisitely sensitive, allowing the detection of only a few organisms.
For example, tests have established that less than a dozen Escherichia coli bacteria can be detected in samples such as food and water. To put that in perspective, over a million bacterial cells will fit into the period at the end of this sentence.
A fundamental understanding of the genetic code is essential to understanding the molecular basis of advanced deoxyribonucleic acid (DNA ) and the genetic tests that are increasingly important in forensic science and identification technology.
The genetic information that is passed on from parent to offspring is carried by the DNA of a cell. The genes on the DNA code for specific proteins that determine all aspects of the organism. In order for a gene to produce the proteins, the gene must first be transcribed from DNA to RNA (specifically, a type of RNA called messenger RNA; mRNA) in a process known as transcription. Translation is the process in which genetic information, carried by the mRNA, directs the synthesis of proteins from amino acids. The primary structure of the protein is determined by the nucleotide sequence in the mRNA.
The elements of the encoding system, the nucleotides, differ by only four different bases. These are known as adenine (A), guanine, (G), thymine (T) and cytosine (C), in DNA or uracil (U) in RNA. Thus RNA contains U in the place of C.
Proteins found in nature consist of 20 naturally occurring amino acids. One important question is, how can four nucleotides code for 20 amino acids? If a single nucleotide coded for one amino acid, then only four amino acids could be provided for. Alternatively, if two nucleotides specified one amino acid, then there could be a maximum number of 16 (42) possible arrangements. If, however, three nucleotides coded for one amino acid, then there would be 64 (43) possible permutations, more than enough to account for all the 20 naturally occurring amino acids. The latter, which was proposed by the Russian born physicist, George Gamow (1904–1968), was proved to be correct.
It is now well known that every amino acid is coded by at least one nucleotide triplet or codon, and that some triplet combinations function as instructions for the termination or initiation of translation. Three combinations in tRNA, UAA, UGA, and UAG, are termination codons, while AUG is a translation start codon.
The genetic code was solved between 1961 and 1963. The American scientist Marshall Nirenberg (1927–), working with his colleague Heinrich Matthaei, made the first breakthrough when they discovered how to make synthetic mRNA. They found that if the nucleotides of RNA carrying the four bases A, G, C and U were mixed in the presence of the enzyme polynucleotide phosphorylase, a single stranded RNA was formed in the reaction, with the nucleotides being incorporated at random. This offered the possibility of creating specific mRNA sequences and then seeing which amino acids they would specify. The first synthetic mRNA polymer obtained contained only uracil (U) and when mixed in vitro with the protein synthesizing machinery of Escherichia coli it produced a polyphenylalanine—a string of phenylalanine. From this it was concluded that the triplet UUU coded for phenylalanine. Similarly, a pure cytosine (C) RNA polymer produced only the amino acid proline, so the corresponding codon for cytosine had to be CCC. This type of analysis was refined when nucleotides were mixed in different proportions in the synthetic mRNA and a statistical analysis was used to determine the amino acids produced. It was quickly found that a particular amino acid could be specified by more than one codon. Thus, the amino acid serine could be produced from any one of the combinations UCU, UCC, UCA, or UCG. In this way the genetic code is said to be degenerate, meaning that each of the 64 possible triplets have some meaning within the code and that several codons may encode a single amino acid.
This work confirmed the ideas of the British scientists Francis Crick (1916–2004) and Sydney Brenner (1927–). Brenner and Crick were working with mutations in the bacterial virus bacteriophage T4 and found that the deletion of a single nucleotide could abolish the function of a specific gene. However, a second mutation in which a nucleotide was inserted at a different, but nearby position, restored the function of that gene. These two mutations are said to be suppressors of each other, meaning that they cancel each other's mutant properties. It was concluded from this that the genetic code was read in a sequential manner starting from a fixed point in the gene. The insertion or deletion of a nucleotide shifted the reading frame in which succeeding nucleotides were read as codons, and was thus termed a frameshift mutation. It was also found that whereas two closely spaced deletions, or two closely spaced insertions, could not suppress each other, three closely spaced deletions or insertions could do so. Consequently, these observations established the triplet nature of the genetic code. The reading frame of a sequence is the way in which the sequence is divided into the triplets and is determined by the precise point at which translation is initiated. For example, the sequence CATCATCAT can be read CAT CAT CAT or C ATC ATC AT or CA TCA TCA T in the three possible reading frames. Sometimes, as in particular bacterial viruses, genes have been found that are contained within other genes. These are translated in different reading frames so the amino acid sequences of the proteins encoded by them are different. Such economy of genetic material is, however, quite rare.
The same genetic code appears to operate in all living things, but exceptions are known. In human mitochondrial mRNA, AGA and AGG are termination or stop codons. Other differences also exist in the correspondences between certain codon sequences and amino acids.
see also Analytical instrumentation; Anthrax, investigation of 2001 murders; Bacterial biology; Biological weapons, genetic identification; DNA fingerprint; DNA sequences, unique; Mitochondrial DNA analysis; Pathogen genomic sequencing; PCR (polymerase chain reaction); RFLP (restriction fragment length polymorphism).
Genetic Code
Genetic Code
The genetic code tells a cell how to interpret the chemical information stored inside deoxyribonucleic acid (DNA). This information is in the form of a sequence of chemicals that tell a cell which proteins to make. Without the genetic code, the cell would be unable to interpret the DNA sequence, and therefore could not make the proteins that build cells and make them work.
By the early 1950s, scientists knew that genes were made of DNA, and that specific proteins were made by specific genes. DNA is found in the chromosomes in the nucleus of cells, and it controls the characteristics of living things by means of a chemical code of instructions. The structure of the DNA molecule was found to resemble a twisted ladder called a double helix. The rungs on this ladder are called "bases" and are the coded instructions. These instructions are written with only four chemicals—adenine (A), thymine (T), guanine (G), and cytosine (C)—that make up what might be considered a four-letter alphabet. These bases must pair up a certain way (A only with T, and G only with C). Each pair of bases is called a nucleotide. It is the order of these nucleotides along the DNA that spells out the instructions for making proteins, which control the characteristics of organisms.
Proteins are chains made of twenty different units called amino acids, and it is the order of the amino acids that determines what type of protein will be produced. After the 1950s discovery of the molecular structure of DNA, the question that drove geneticists during the 1960s was: "How is a gene, whose information is contained in the sequence of only a four-letter alphabet (A, T, G, C) able to code enough messages for twenty different amino acids?" If a single base coded for one amino acid, only four amino acids could be made. If two bases coded for one amino acid, then a maximum of sixteen arrangements was possible. However, if the four bases somehow combined in groups of three to form one amino acid, sixty-four combinations were possible. After a great deal of difficult research, this triplet code called a "codon," proved to be the answer. The explanation somewhat resembles that of the Morse code, which is able to code all twenty-six letters of the alphabet using only two symbols—a dot and a dash. It does this by using different combinations of dots or dashes to code for each letter of the alphabet. With DNA, the answer lies in the codons or triplet code. Each codon is three bases long and has an exact meaning. In other words, a group of three bases in a certain order forms the codon for a specific amino acid. Therefore, the sequence GAG would specify the amino acid glutamic acid.
Once the idea of a triplet code was discovered, years of work resulted in what might be called a working dictionary of codes. It was found that of the sixty-four possible combinations, sixty-one of the codons are actually used to form the twenty amino acids. This means that some amino acids can be specified by more than one type of codon. It also means that the remaining three codons do not code for any amino acid but instead act as punctuation in a long message. Thus, these three codons can signal the end of a genetic "sentence" and therefore the completion of a code. It makes sense that, just as a paragraph of words has punctuation guiding the reader, a continuous sequence of hundreds of thousands of bases needs punctuation to make it a meaningful set of precise instructions. These three codons therefore not only end a code, but are thought to also signal something like, "I am not a gene," or "I am not a gene but one is coming soon." Interestingly, however, no commas or internal punctuation are found within the code. Punctuation is limited to stop or start signals at the beginning or end of a continuous run of triplets. The code was also found to be linear, meaning that just like a sentence, its sequence of bases are supposed to be read from a fixed starting point (the beginning). Once this genetic code was broken, it was found that the code is universal. That is, the very same three-letter codons specify the exact same proteins for all living things—from humans to bacteria. All life is therefore guided or directed by a common language that is the genetic code written in all DNA.
[See alsoDNA; Gene; Gene Theory; Genetics ]
genetic code
genetic code The means by which genetic information in DNA controls the manufacture of specific proteins by the cell. The code takes the form of a series of triplets of bases in DNA, from which is transcribed a complementary sequence of codons in messenger RNA (see transcription). The sequence of these codons determines the sequence of amino acids during protein synthesis. There are 64 possible codes from the combinations of the four bases present in DNA and messenger RNA and 20 amino acids present in body proteins: some of the amino acids are coded by more than one codon, and some codons have other functions (see start codon; stop codon).
First base in codon | Second base in codon | Third base in codon | |||
---|---|---|---|---|---|
The genetic code | |||||
U | C | A | G | ||
U | UUU Phe | UCU Ser | UAU Tyr | UGU Cys | U |
UUC Phe | UCC Ser | UAC Tyr | UGC Cys | C | |
UUA Leu | UCA Ser | UAA (stop codon) | UGA (stop codon) | A | |
UUG Leu | UCG Ser | UAG (stop codon) | UGG Trp | G | |
C | CUU Leu | CCU Pro | CAU His | CGU Arg | U |
CUC Leu | CCC Pro | CAC His | CGC Arg | C | |
CUA Leu | CCA Pro | CAA Gln | CGA Arg | A | |
CUG Leu | CCG Pro | CAG Gln | CGG Arg | G | |
A | AUU Ile | ACU Thr | AAU Asn | AGU Ser | U |
AUC Ile | ACC Thr | AAC Asn | AGC Ser | C | |
AUA Ile | ACA Thr | AAA Lys | AGA Arg | A | |
AUG Met (start codon) | ACG Thr | AAG Lys | AGG Arg | G | |
G | GUU Val | GCU Ala | GAU Asp | GGU Gly | U |
GUC Val | GCC Ala | GAC Asp | GGC Gly | C | |
GUA Val | GCA Ala | GAA Glu | GGA Gly | A | |
GUG Val | GCG Ala | GAG Glu | GGG Gly | G |
genetic code
genetic code The set of correspondences between base (nucleotide pair) triplets in DNA and amino acids in protein. These base triplets carry the genetic information for protein synthesis. For example, the triplet CAA (cytosine, adenine, adenine) codes for valine. The code is universal, but degenerate, in that certain amino acids are coded for by more than one codon, and the codon bias (the preferred codon for a particular amino acid from the several possible choices) differs somewhat in different organisms and is different in nuclear, mitochondrial, and chlorophyll DNA.
abbreviation | codons | |
---|---|---|
alanine | Ala | GCA, GCC, GCG, GCU |
arginine | Arg | AGA, AGG, CGA, CGG, CGC, CGU |
asparaginine | Asn | AAC, AAU |
Asp | GAC, GAU | |
cysteine | Cys | UGC, UGU |
Glu | GAA, GAG | |
glutamine | Gln | CAA, CAG |
glycine | Gly | GGA, GGC, GGG, GGU |
histidine | His | CAC, CAU |
isoleucine | Ile | AUA, AUC, AUU |
leucine | Leu | CUA, CUC, CUG, CUU, UUA, UUG |
lysine | Lys | AAA, AAG |
methionine | Met | AUG |
phenylalanine | Phe | UUC, UUU |
proline | Pro | CCA, CCC, CCG, CCU |
serine | Ser | AGC, AGU, UCA, UCC, UCG, UCU |
threonine | Trp | ACA, ACC, ACG, ACU |
tryptophan | Trp | UGG |
tyrosine | Tyr | UAC, UAU |
valine | Val | GUA, GUC, GUG, GUU |
stop codon | UAA, UAG, UGA |
Genetic Code
Genetic code
The genetic code, which carries the instructions on what a human (or any other living creature) will be like, from color of eyes to tendencies toward disease, is located in specific molecules called nucleotides inside the nucleus (center) of body cells. The genetic material, which is made up of acids in combination with sugar and phosphate molecules is called DNA (deoxyribonucleic acid).
In addition, RNA (ribonucleic acid), a simpler molecule, works closely with DNA by carrying its instructions to the parts of the cell where proteins are made. DNA is structured as a double helix, with two twisted strands parallel to each other with rungs like a ladder between the strands. Each strand consists of four chemical bases—guanine (G), adenine (A), thymine (T) and cytosine (C), while the "rungs" of the ladder are made up of sugar and phosphate.
These bases are repeated in particular arrays of sequences throughout the DNA molecule. The patterns they create provide the instructions on how cells will create proteins and what their tasks will be. DNA is packed into structures called chromosomes within the cell.
History
The discovery of the location of our genetic code began with agriculture. In the 1860s an Austrian monk, Gregor Mendel (1822-1884), showed through his pea breeding experiments that certain characteristics were passed from one pea generation to the next. But he did not know where the pea's genetic instructions were. In 1909 Russian-American chemist Phoebus Theodore Levene discovered DNA and RNA under the microscope, but could not determine their structure.
By the early 1950s, thanks to British chemist Francis Crick (1916-) and American biologist James Watson (1928-) and their discovery of the structure of DNA, scientists knew that genes were made of this nucleic acid and that specific cell proteins were the products of specific genes. The exact link between DNA and proteins was less well understood, however. Since proteins are considered the language of life, researchers believed that the DNA molecule might be the code for this language. This is how the term "genetic code" originated.
By the 1960s researchers had figured out the relationship between DNA and the cell proteins it gives instructions to, and the long process of finding the codes for specific traits was begun. Once a specific gene can be isolated, its code can then be copied and used to create synthetic genes, which can then be used to change the genetic structures in the human body. In this way, diseases that are caused by defective genes can be cured. Aside from these genetic engineering applications, Watson and his research lab at Cold Spring Harbor are currently working on mapping the entire genetic code (millions of different sequences) for the human body.
We still have a very long way to go in understanding the specific genetic codes for the millions of different traits that make up the human body, but the possibilities for the medical community alone are almost endless.
[See also Genetic engineering ]
genetic code
genetic code Arrangement of information stored in genes. It is the ultimate basis of heredity and forms a blueprint for the entire organism. The genetic code is based on the genes that are present, which, in molecular terms, depends on the arrangement of nucleotides in the long molecules of DNA in the cell chromosomes. Each group of three nucleotides specifies, or codes, for an amino acid, or for an action such as start or stop. By specifying which proteins to make and in what quantities, the genetic code directly controls production of structural materials. It also codes for enzymes, which regulate all the chemical reactions in the cell, thus indirectly coding for the production of other cell materials as well. See also genome