Genetic Variation Among Populations
Genetic Variation Among Populations
THE APPORTIONMENT OF VARIATION
VARIATION AMONG POPULATIONS (FST)
ESTIMATES OF FST FOR GEOGRAPHIC RACES
SKIN COLOR, RACIAL CLASSIFICATION, AND FST
GEOGRAPHIC DISTANCE AND THE PATTERN OF GENETIC VARIATION
Questions regarding the usefulness of the concept of “race” to the study of human genetic diversity must ultimately be answered with reference to the degree and patterning of genetic variation. Specifically, three questions must be addressed. First, how much variation exists among populations, relative to the amount of variation within populations? Second, what is the pattern of genetic variation among populations? That is, are all populations equally related—and if not, what are the geographic and historical factors that have influenced the genetic relationship among populations? Third, do studies of the degree and pattern of human genetic variation provide any answers to questions regarding the utility of the “race” concept?
THE APPORTIONMENT OF VARIATION
One of the main interests in studies of genetic variation is the question of how variation is apportioned both within and among populations. In other words, if a species is considered as made up of a number of different populations, how much of the total variation in the species exists within each population, and how much variation exists among all the populations? Although it is most convenient to define and discuss these concepts in mathematical terms, an intuitive approach is taken here in order to provide an understanding of the basic principles behind the apportionment of diversity.
The amount of variation within a population refers to the differences that exist between the members of that population. If, for example, a population consisted entirely of clones, then everyone in the population would be genetically the same, and there would be no variation within the population. The more different the individuals are from each other genetically, the greater the level of variation within the population. The exact level of this variation can be measured in different ways, depending on the specific measure or estimate of genetic variation at which one is looking. Variation among populations refers to the level of differences between two or more populations. If two populations were genetically the same, then there would be no variation among the populations. The more different the populations are from each other, the greater the level of variation among the populations.
A simple example of how these concepts work uses an analogy based on sorting out shapes. So, if one has ten squares and ten triangles, what are the different ways these twenty objects can be placed into two buckets, with each containing half of the objects? Three different cases are illustrated in Figure 1. In case number one, the first bucket contains ten squares and the second bucket contains ten triangles. Because all of the objects in the first bucket are squares, they are by definition all the same, so there is no variation within that bucket. The same result applies to the second bucket: each of the ten objects is a triangle, so there is no variation within that bucket. In both cases, the amount of variation within buckets is zero. Now, consider the amount of variation that exists among the buckets. This is by considering the frequency of squares and triangles in each bucket. The first bucket is made up of 100 percent squares and 0 percent triangles, whereas the second bucket has 0 percent squares and 100 percent triangles. In other words, the composition of the two buckets is completely different. When apportioning diversity, the amount of within-group variation plus the amount of among-group variation adds up to 100 percent. In this example, all of the variation exists among the two buckets, so that we could state that the amount of variation among groups is 100 percent and the amount of variation within groups is 0 percent.
The second case in Figure 1 shows the opposite pattern. There are still ten squares and ten triangles, but they are apportioned differently between the two buckets. Each bucket contains five squares and five triangles. Thus, there is variation within each bucket, because there are the two different shapes in each. However, there is no difference in the relative frequency of squares and triangles among the two buckets, as each bucket consists of 50 percent squares and 50 percent triangles. Because the two frequencies are the same, there is no difference among the buckets, and therefore the level of among-group variation is zero. In this case, all of the variation is within the buckets, meaning that the among-group variation is 0 percent and the within-group variation is 100 percent.
The third case in Figure 1 has the first bucket containing six squares and four triangles and the second bucket containing four squares and six triangles. Thus, some variation exists within each bucket and, because the proportions of squares and triangles in the two buckets are not quite the same, some variation exists among the buckets as well.
What does all of this have to do with genetics and populations? The same principles of apportionment of variation apply to genetic data. Completing the analogy, consider the squares and triangles as equivalent to different forms of a gene and the buckets as equivalent to populations. In genetics, we refer to different forms of a gene as alleles. When looking at biochemical and molecular data, such as blood groups and DNA markers, there are standard methods for measuring the levels of within-group and among-group variation based on the relative frequency of alleles.
VARIATION AMONG POPULATIONS (FST)
In population genetics, researchers are interested in the relative amount of variation that exists among populations, a term known by a number of symbols and names, but most often labeled FST. FST is the proportion of total variation that is due to variation among populations. The value of FST can range from 0.0 to 1.0 (or, in terms of percentages, from 0 percent to 100 percent). Considering the objects in Figure 1 as equivalent to alleles, the first case would have an FST equal to 1.0, meaning that the two populations are completely different in their allele frequencies and that everyone within the groups is genetically the same. In the second case in Figure 1, FST is equal to 0.0, meaning that the two populations have the same allele frequencies and that all of the genetic variation in the species occurs within the populations. The solution for FST in the third case is not intuitively obvious but can be computed using a standard population genetics formula, which results in FST being 0.04 in this example. This value means that 4 percent of the total variation exists among the two populations, leaving the remainder (96%) of the variation existing within the populations.
In reality, what is desired is not a reliance on only one gene for these estimates, but instead an average across as many genes as possible. There are several reasons for this. First, using numerous genes where possible minimizes sampling error. Second, natural selection can lead to differences in FST above or below what would be expected on average. If, for example, one were looking at a gene where different alleles were selected for in different populations, then the genetic difference between the populations would be greater than expected for genes not affected by differences in adaptation (neutral genes). Overall, FST is affected by the balance between gene flow (and mutation) and genetic drift. Gene flow and mutation lower the average FST and genetic drift increases average FST.
ESTIMATES OF FST FOR GEOGRAPHIC RACES
Given this background, the discussion can now return to the question of the amount of genetic variation that exists between races. This problem was first tackled quantitatively by Richard Lewontin in 1972 by using allele frequencies from across the world for a number of genetic markers based on red blood cells. Lewontin then subdivided the world into seven geographic “races” (although noting the difficulty in doing so): “Caucasians,” “Black Africans,” “Mongoloids,” “South Asian Aborigines,” “Amerinds,” “Oceanians,” and “Australian Aborigines.” Within each of the seven races, he collated genetic data for a number of different local populations. For example, within the “Caucasian” race, he collected data on Belgians, Greeks, Italians, Iranians, Indians, and other populations in Europe, the Middle East, and South Asia. By looking at data at the level of race and local population, Lewontin was able to extend the principle of apportionment by breaking down the “within-race” component into: (1) variation among local populations within race, and (2) variation within local populations. He found that 6.3 percent of the total variation existed among races, 8.3 percent existed among local populations within races, and 85.4 percent existed within local populations. Lewontin concluded “human races and populations are remarkably similar to each other, with the largest part by far of human variation being accounted for by the differences between individuals” (Lewontin 1972, p. 397).
Since Lewontin’s original work, additional data have been collected for red blood cell and other genetic markers for many more populations. Different researchers, realizing the arbitrary nature of enumerating and categorizing different geographic races, have tried different clusterings of local populations that make up each race. Overall, the results are consistent: approximately 10 percent of the genetic variation in the human species is among races (geographic regions), 5 percent is among local populations within races, and 85 percent is within local populations. The same pattern was also found in a comprehensive analysis of newer DNA markers by Guido Barbujani and colleagues (1997): 11 percent among geographic regions, 5 percent among local populations within geographic regions, and 84 percent within local populations. Another study by Lynn Jorde and colleagues (2000) showed that although some genetic traits, such as mitochondrial DNA, have higher levels of variation among geographic regions, the majority of variation is still within local populations (roughly 70%).
The principles of apportionment of diversity have also been extended to complex physical traits, such as cranial length. Even though such traits are affected by nongenetic as well as genetic factors, it is possible to obtain a rough estimate of the percentage of variation among and within groups. John Relethford (2002) examined a global sample of cranial measures and found results very similar to those from genetic markers: 13 percent among geographic regions, 6 percent among populations within geographic regions, and 81 percent within local populations.
The major inference from these studies is that if the world is divided into a set of races, then the overwhelming amount of human genetic diversity exists within races (and most of that further exists within the local population), and consequently that race explains a relatively small fraction of the species’ diversity. This finding runs counter to views on race that emphasize group differences while minimizing variation within races.
To put it another way, the relatively low levels of variation among geographic races means that there is a great deal of overlap in the distributions of most traits, including blood cell markers, DNA markers, and cranial measures. Thus, the idea of discrete races that are easily identifiable from one another based on allele frequencies (or measures of metric traits) does not hold up well. There is certainly variation in most traits, as well as a geographic patterning to such variation, because human populations in different parts of the world tend to differ somewhat from each other. However, the level of these differences, as estimated by FST and related statistics, is rather low.
SKIN COLOR, RACIAL CLASSIFICATION, AND FST
Not all traits, genetic or physical, show low levels of among-group variation. In some cases, there is a high level of variation among geographic races. However, these exceptions to the general rule do not provide evidence of the existence of discrete human races, but instead point to the action of natural selection operating on some traits to inflate the level of among-group variation. One example that is particularly relevant to the question of racial classification is skin color, a trait that is measured in human populations using a reflectance spectrophotometer, a device that measures the percentage of light reflected back from the skin at given wavelengths. John Relethford examined the apportionment of diversity using a global compilation of skin reflectance data and found that skin color showed the opposite pattern from that revealed by genetic markers and cranial measures. For skin color, the vast majority of variation was found to exist among geographic races (88%), with only 3 percent among local populations within geographic races, and 9 percent within local populations. These results are expected and intuitively obvious. For example, even though there is variation in skin color among indigenous Scots or indigenous Ethiopians, it is clear that the former have very light skin and the latter have very dark skin. Indeed, the large and easily noticeable difference in skin color across the globe is a reason that skin color factors into virtually every racial classification scheme that has been proposed.
However, the finding of a large level of among-group variation for skin color does not provide support for the existence of discrete races whose very definition was linked to skin color in the first place. If such discrete groups are so readily identifiable based on one trait, they should also be found based on other traits, but this is not the case. What needs to be explained is why skin color is so atypical when compared to all of the other genetic and physical traits that show low levels of among-group variation.
The answer is that skin color is affected differentially by natural selection across geographic space. Skin color shows a very strong correlation with latitude, so that indigenous populations near or at the equator tend to be the darkest, while populations farther away from the equator (north or south) tend to be lighter. This correlation has been linked to levels of ultraviolet radiation, which also varies by latitude—ultraviolet radiation levels are highest at or near the equator and lower farther away from the equator. A traditional explanation of the evolution of human skin color differences is as follows. In human species’ past, darker skin was selected for in populations that lived in areas of high ultraviolet radiation, because the darker skin is less prone to damage such as sunburn, skin cancer, and the photodestruction of folate, a needed nutrient. As human ancestors dispersed out of Africa, they moved into areas of lower ultraviolet radiation. For these groups, the problem of survival changed from danger due to too much ultraviolet radiation to danger from too little, such as lower levels of vitamin-D synthesis (ultraviolet radiation provided the major source of vitamin D in most human populations prior to modern times). It appears that, in this situation, lower levels of ultraviolet radiation selected for lighter skin in human populations. Although there is some debate over the exact factors responsible for changes in human skin color, there is little argument that natural selection has shaped the range of human skin color variation. The result has been the evolution of extreme levels of pigmentation in different environments geographically far apart, thus leading to an increased level of among-group variation.
Even if one ignores data showing low levels of racial differences and focuses on skin color, a closer examination shows that the geographic pattern of human skin-color variation does not fit a model of discrete racial groupings. Quite simply, skin color does not come in a finite number of shades, despite the repeated use of classificatory words such as “black,” “white,” and “brown.” Instead, the distribution of human skin color shows a gradient that is correlated with latitude. To put this in a nonstatistical frame of reference, imagine someone starting at the equator and walking north. As that person starts walking, the indigenous people he or she sees will tend to be very darkly pigmented. With continued walking, the average skin color will tend to become lighter and lighter. In other words, the walker will see one level of pigmentation blend into the next, with no apparent discontinuities, a pattern that is at odds with a discontinuous and discrete definition of race.
GEOGRAPHIC DISTANCE AND THE PATTERN OF GENETIC VARIATION
The majority of genetic variation in the human species exists within local populations, and a smaller fraction (typically about 10 to 15%) is found among geographic races. It is also important to consider the pattern of among-group variation as well as the magnitude. Human genetic variation typically follows a pattern known as “isolation by distance.” This means that the farther two populations are from one another geographically, the more genetically different they will be from one another. To test this model, genetic data is used to derive measures of genetic distance between pairs of populations, and these values are plotted as a function of the geographic distance between each pair of populations. Figure 2 shows an example of the relationship between genetic distance and geographic distance on a global scale using the genetic distances given by L. Luca Cavalli-Sforza and colleagues (1994). The figure clearly shows how the genetic differences between human populations are smallest among those populations that live close to each other, and how they increase among populations that are located farther away from each other. Similar results have been found for a variety of genetic data and cranial measures (Relethford 2004).
This pattern of isolation by distance is frequently found among populations in a small region, such as villages within a country, and it typically reflects the limiting effect of geographic distance on the movement of people, and hence on the movement of genes. Throughout human history and prehistory, the highest frequency of mating took place close
to home, such that populations close to each other in space have tended to share more genes, all other things being equal. It is easy to see how geographic distance can limit movement, and this was particularly true in earlier times. It is less clear, however, if the global pattern of isolation by distance is completely due to the limiting impact of geographic distance. It is also quite likely that the human species’ origins played a role in structuring the geographic correlation of genetic diversity. Most anthropologists agree that modern human populations appeared first in sub-Saharan Africa somewhere between 130,000 and 195,000 years ago, followed by dispersion throughout the rest of the world. Although there is continuing debate over whether modern humans replaced or mixed with pre-existing humans outside of Africa (such as the Neandertals of Europe), the general finding of an African origin and dispersal is supported by both genetic and fossil evidence. Therefore, the correlation seen in the early twenty-first century between geographic distance and genetic distance may be a reflection of this dispersal.
Regardless of the relative impact of migration and population history, the important point here is that human genetic variation is geographically structured. The genetic differences that exist among human populations in distant parts of the planet have often been considered representative of racial differences, but the actual pattern of geographic variation is continuous and does not fit a model of discrete races.
An analysis of the pattern of genetic variation among living human populations does not provide support for a rigid application of the biological race concept to the human species. First, the amount of variation that exists among geographic races is relatively low, indicating a great deal of overlap in allele frequencies and measures of physical traits. Second, those traits that do show higher levels of racial differences, such as skin color, are atypical in this respect and reflect the evolutionary history of the trait. Third, the pattern of genetic differences among human populations is a reflection of geographic distance and migration history, and thus does not conform to a model of discrete and non-overlapping races.
It is also clear, that denying an application of a strict definition of biological race does not mean that human genetic variation is nonexistent or that all human populations are genetically the same. A refutation of the race concept does not equate to a denial of variation. It is clear that there is genetic variation in the human species and that it is geographically structured. What continues to be debated in the “race question” is the best way to describe this variation and how well the race concept, other than as a first-order approximation, serves a descriptive function. An application of the concept of race is only a crude attempt to describe continuous variation in terms of discrete clusters, much as people attempt to reduce socioeconomic variation into “classes” or political orientation into “liberals” and “conservatives.” Imposing discrete labels on continuous variation is not necessarily bad, as long as one is careful not to reify those labels, and as long as there is some justification for its use over analyses that focus on local populations as the unit of evolution and analysis. In terms of analyzing human biological variation, it has long been known that subdividing the human species into races is at best an exercise in classification, but one that obscures the fine details of variation and explains little about the underlying causes of variation.
SEE ALSO Clines and Continuous Variation; Clines and Continuous Variation; Forensic Anthropology and Race; Gene Pool; Genetic Distance; Genetic Marker; Genetics, History of; Human and Primate Evolution; Human Genetics; “Out of Africa” Hypothesis; Racial Hierarchy; Skin Color; UNESCO Statements on Race.
BIBLIOGRAPHY
Barbujani, Guido, Arianna Magagni, Eric Minch, and L. Luca Cavalli-Sforza. 1997. “An Apportionment of Human DNA Diversity.” Proceedings of the National Academy of Sciences USA 94 (9): 4516–4519.
Brown, Ryan A., and George J. Armelagos. 2001. “Apportionment of Racial Diversity: A Review.” Evolutionary Anthropology 10 (1): 34–40.
Cavalli-Sforza, L. Luca, Paolo Menozzi, and Alberto Piazza. 1994. The History and Geography of Human Genes. Princeton, NJ: Princeton University Press.
Jablonski, Nina G., and George Chaplin. 2000. “The Evolution of Human Skin Coloration.” Journal of Human Evolution 39 (1): 57–106.
Jorde, Lynn B., et al. 2000. “The Distribution of Human Genetic Diversity: A Comparison of Mitochondrial, Autosomal, and Y-Chromosome Data.” American Journal of Human Genetics 66 (3): 979–988.
Lewontin, Richard C. 1972. “The Apportionment of Human Diversity.” Evolutionary Biology 6: 381–398.
Relethford, John H. 2002. “Apportionment of Global Human Genetic Diversity Based on Craniometrics and Skin Color.” American Journal of Physical Anthropology 118 (4): 393–398.
_____. 2003. Reflections of Our Past: How Human History Is Revealed in Our Genes. Boulder, CO: Westview.
_____. 2004. “Global Patterns of Isolation by Distance Based on Genetic and Morphological Data.” Human Biology 76 (4): 499–513.
Templeton, Alan R. 1998. “Human Races: A Genetic and Evolutionary Perspective.” American Anthropologist 100 (3): 632–650.
John H. Relethford