Sampling Procedures
SAMPLING PROCEDURES
The analysis of data from samples constitutes a major proportion of contemporary research in the social sciences. For example, researchers use sample data from the U.S. population to estimate, with specified levels of confidence and precision, quantities such as average household size, the proportion of Americans who are unemployed during a given month, and the correlation between educational attainment and annual earnings among members of the labor force. Sample-based estimates are called sample statistics, while the corresponding population values are called population parameters. The most common reason for sampling is to obtain information about population parameters more cheaply and quickly than would be possible by using a complete census of a population. Sampling also is used sometimes when it is not feasible to carry out a complete census. For example, except perhaps in a few nations with population registers, attempts to carry out a census in large countries invariably fail to enumerate everyone, and those who are missed tend to differ systematically from those who are enumerated (Choldin 1994). In these cases sampling may be the only way to obtain accurate information about population characteristics.
Researchers can use sample statistics to make inferences about population parameters because the laws of probability show that under specified conditions, sample statistics are unbiased estimators of population parameters. For example, if one used the same procedure to draw repeated samples from a population and determine the proportion of females in each sample, the average value of the observed sample proportions would equal the actual proportion of females in the population. The laws of probability also show that one can use data from a single sample to estimate how much a sample statistic will vary across many samples drawn from the population. Knowing the variability of a sample statistic (in statistical parlance, its sampling variance) in turn makes it possible to draw conclusions about the corresponding population parameter even though one has data for only a single sample. The key condition that must be met for these conditions to hold is that the data must come from a probability sample, a sample in which each case in the population (cases may be individuals, households, cities, days, etc.) has a known and nonzero probability of being selected into the sample. To produce this type of sample, a sampling technique must employ a random selection mechanism, one in which only the laws of chance determine which cases are included in the sample.
Researchers sometimes use nonprobability samples, such as convenience samples, quota samples, and snowball samples, to avoid the costs in time and money of probability sampling. Convenience samples, which consist of cases chosen simply because they are readily available (such as pedestrians passing by a street corner and volunteers from a classroom), sometimes are used in exploratory research and in the development of questionnaires or interview protocols. Opinion polls sometimes use quota samples, in which interviewers are assigned certain types of people to interview (e.g., a white female over age 50 living on a farm or a black male aged 20 to 30 living in a central city) but are free to select specific individuals fitting these criteria to interview, to gauge political opinions cheaply. Snowball samples are generated when investigators ask known members of a group to identify other members and subsequently ask those who are so identified to identify still other members. This procedure takes advantage of the fact that the members of certain small and hard to locate groups (for example, biochemists studying a particular enzyme) often know one another. None of these nonprobability sampling procedures ensures that all the cases in the target population have a known and nonzero probability of being included in a sample, however (Kalton 1983, pp. 90–93). As a result, one cannot be confident that these procedures will provide unbiased estimates of population parameters or make statistical inferences from the samples they yield. Unless cost and time constraints are severe, researchers seeking to estimate population parameters therefore nearly always use procedures that yield probability samples.
One should distinguish between the representativeness of a sample and whether it was drawn by using probability sampling procedures. Although probability samples have a decidedly better track record in regard to representativeness, not all probability samples are representative and not all nonprobability samples are unrepresentative. Political polls based on quota samples, for example, often produce results that come very close to the subsequent vote. However, there is usually no reason to believe that a nonprobability sampling procedure that has been successful in the past will continue to yield representative results. In contrast, probability sampling procedures are likely to produce representative samples in the future because they are based on a random selection procedure.
Sampling theory, a branch of statistical theory, covers a variety of techniques for drawing probability samples. Many considerations can influence the choice of a sampling procedure for a given project, including feasibility, time constraints, characteristics of the population to be studied, desired accuracy, and cost. Simple sampling procedures are often sufficient for studying small, accessible, and relatively homogeneous populations, but researchers typically must use more complicated procedures to study large and heterogeneous populations. Using complicated procedures requires consultation with a sampling specialist at a survey organization (the University of Illinois Survey Research Laboratory provides a list of these organizations on its Web site: www.srl.uic.edu).
Any study in which a probability sample will be drawn must begin by defining the population of interest: the target population. The purpose of the study restricts the definition of the target population but rarely specifies it completely. For example, a study of characteristics of U.S. families obviously will define the population as consisting of families, but it will be necessary to define precisely what counts as a family as well as decide how to treat various cases from which it may be difficult to collect data (such as the families of U.S. citizens who live overseas). Sudman (1976, pp. 11–14) discusses general issues involved in defining target populations.
The next step in probability sampling is to construct a sampling frame that identifies and locates the cases in the target population so that they can be sampled. The most basic type of sampling frame is a list of the cases in the target population. Such lists are often unavailable, however, and so researchers usually must construct an alternative. For example, to draw a sample of U.S. public high schools, a researcher might begin with a list of U.S. census tracts, select a sample of those tracts, and then consult maps that indicate the locations of public high schools in the selected tracts. Here the sampling frame would consist of the list of census tracts and their corresponding maps.
A perfect sampling frame includes all the cases in the target population, no inappropriate cases, and no duplications. Most sampling frames are imperfect, however, with failure to include all the cases in the target population being the most serious type of coverage error. For example, telephone-number sampling frames, such as those employed in random-digit dialing procedures, do not cover people without a telephone, and sampling frames that are based on dwelling units do not cover homeless people. Undercoverage errors bias sample statistics, with the extent of the bias being positively related to (1) the proportion of the target population not covered by the sampling frame and (2) the magnitude of the difference between those covered and those not covered. Sampling experts have developed many methods to reduce coverage errors, including the use of multiple frames, multiplicity techniques, and postsurvey adjustments. Kish (1995, pp. 53–59, 384–439) and Groves (1989, pp. 81–132) provide helpful discussions of sampling-frame problems and possible solutions.
BASIC PROBABILITY SAMPLING PROCEDURES
Characteristics of one's sampling frame influence the specific sampling procedure appropriate for producing a probability sample. For example, some sampling procedures require that the sampling frame list all the cases in the population, while others do not. In addition, sampling procedures often are combined in situations where the sampling frame is complex. In all situations, however, the key element required for producing a probability sample is the use of a formally random procedure for selecting cases into the sample.
Simple Random Sampling. Simple random sampling (SRS) is the most elementary probability sampling procedure and serves as a benchmark for the evaluation of other procedures. To use SRS, one's sampling frame must list all the cases in the population. Usually the researcher assigns a unique identification number to each entry in the list and then generates random numbers by using a random number table or a computer program that produces random numbers. If a random number matches one of the identification numbers in the list, the researcher adds the indicated case to the sample (unless it has already been selected). This procedure is followed until it produces the desired sample size. It is important that only the randomly generated numbers determine the sample's composition; this condition ensures that the sampling procedure will be unbiased and that the chosen cases will constitute a probability sample.
With SRS, all cases in the sampling frame have an equal chance of being selected into the sample. In addition, for a sample of size n, all possible combinations of n different cases in the sampling frame have an equal chance of constituting the sample. The formulas for standard errors found in nearly all statistics textbooks and those used in statistical programs for computers assume that SRS generated the sample data. Most studies of human populations use sampling procedures that are less efficient than SRS, however, and using SRS formulas in these instances underestimates the sampling variances of the statistics. As a consequence, researchers frequently conclude that differences or effects are statistically significant when they should not do so, or they may report misleadingly small confidence intervals.
Systematic-Simple Random Sampling. When a sampling frame contains many cases or the size of the prospective sample is large, researchers often decide to economize by setting a sampling interval and, after a random start, using that interval to choose the cases for the sample. For example, suppose a researcher wanted to select a sample of n cases from a population of size N and n/N = 1/25. To use systematic simple random sampling (SSRS), the researcher would draw a random number, r, between 1 and 25 and, starting with the rth case, select every twenty-fifth case in the sampling frame (for more complicated examples, see Kalton 1983, p. 17). This procedure gives all the cases in the frame an equal probability of being chosen for the sample but, unlike SRS, does not give all combinations of cases equal probabilities of selection. In the above example there are only 25 possible combinations of cases that could constitute the resulting sample (for example, cases 105 and 106 could never be in the same sample).
When the order of the cases in the sampling frame is random with respect to the variables of interest in a study, this property of SSRS is inconsequential, but when the frame is cyclically ordered, the results of SSRS can differ significantly from those of SRS. For example, suppose one wished to sample starting players on college basketball teams to determine their average height and had a sampling frame ordered by team and, within each team, by position. Since there are five starting players on each team, a sampling interval of any multiple of 5 would yield a sample composed of players who all play the same position. There would be a 1 in 5 chance that these players would all be centers (usually the tallest players) and a 2 in 5 chance that they would all be guards (usually the shortest). Thus, in this instance the sampling variation of the players' mean height would be substantially greater than the variation that SRS would produce. However, there are also situations in which stratified random sampling SSRS is equivalent to (StRS) (see below) and yields samples that have smaller sampling variances than those from SRS (Kish 1995, pp. 113–23). In practice, most lists of entire populations have orderings, often alphabetical, that are essentially random with respect to the purposes of a study, and lists with potential problems usually are obvious or are quickly recognized. Thus, in most applications SSRS is essentially equivalent to SRS (Sudman 1976, pp. 56–57).
Stratified Random Sampling. When a sampling frame consists of a list of all the cases in a population and also contains additional information about each case, researchers may use StRS. For example, a list of people also might indicate the sex of each person. A researcher can take advantage of this additional information by grouping individuals of each sex into a sublist (called a stratum) and then sampling, using SRS or SSRS, from each stratum. One can use either the same sampling fraction for each stratum, in which case the procedure is called proportionate StRS, or different fractions for different strata (disproportionate StRS). In either case one usually attempts to use the additional information contained in the sampling frame to produce a sample that will be more efficient than one derived from other sampling procedures (i.e., it will need fewer cases to produce a sample with a given precision for estimating a population parameter).
Efficiency is commonly measured by a sampling procedure's design effect, the ratio of the sampling variance of a statistic based on that procedure to the sampling variance of the same statistic derived from an SRS with the same number of cases (Kalton 1983, pp. 21–24). The efficiency of proportionate StRS is directly related to the correlation between the variable used to stratify the sampling frame and the variable or variables being studied. Thus, if one wished to determine the mean individual income of a population of Americans, proportionate StRS based on sex would produce a more efficient sample than would SRS and would have a design effect smaller than unity, because sex is correlated with income. In the limiting case in which the stratifying variable is perfectly correlated with the variable or variables being studied—for example, if each woman earned $15,000 per year and each man earned $25,000— proportionate SrRS would always yield a sample mean exactly equal to the population mean. By contrast, if sex were completely uncorrelated with income, proportionate StRS would be no more efficient than SRS, and the design effect of StRS would equal unity. In practice it is usually difficult to obtain sampling frames that contain information about potential stratifying variables that are substantially correlated with the variables being studied, especially when the cases are individuals. As a result, the gains in efficiency produced by proportionate StRS are often modest.
Proportionate StRS often yields small sample sizes for strata that consist of small proportions of a population. Thus, when researchers want to estimate parameters for the individual strata in a population, they sometimes employ disproportionate StRS to ensure that there will be enough cases from each stratum in the overall sample. A second reason for using disproportionate StRS is to design an optimal sample, one that produces the most precise estimates for a given cost, when there are differences between the strata in terms of (1) the cost of sampling and obtaining data, (2) the variability of the variables under study, or (3) prior knowledge about the variables under study. Sudman (1976, pp. 107–130) discusses and gives examples of each of these situations. The benefits of disproportionate StRS may be hard to attain when one wants to draw a multipurpose sample with observations on many variables, however, because the optimal procedures for the different variables may conflict. In addition, although proportionate StRS cannot have a design effect greater than unity, the design effects for disproportionate StRS can be larger than unity, meaning that disproportionate StRS can produce samples that are less efficient than those derived from SRS (Kalton 1983, pp. 20–26).
Cluster Sampling. All the sampling procedures discussed above require that the researcher have a sampling frame that lists the cases in the target population. Unfortunately, such sampling frames rarely exist, especially for human populations defined by area of residence. One can still draw a probability sample, however, if the population can be organized in terms of a grouping principle and each case can be assigned to one of the groups (called clusters). For example, dwellings in cities are located in blocks defined by streets. Even if a list of dwellings does not exist, it is possible to draw a probability sample by constructing a sampling frame that consists of a listing of the blocks, drawing a random sample of the blocks, and then collecting data on the dwellings in the chosen blocks.
This procedure, which is called cluster sampling (CS), is also advantageous when one wishes to use face-to-face interviewing to survey geographically dispersed populations of individuals. In this case CS is less costly because it allows the survey to concentrate interviewers in a small number of locations, thus lowering traveling costs. However, CS usually produces samples that have larger sampling variances than those drawn from SRS. The efficiency of CS is inversely related to (1) the extent to which clusters are internally homogeneous and differ from each other and (2) the number of cases sampled from each cluster. CS is maximally efficient when a population can be divided into clusters that are identical, because each cluster will then be a microcosm of the population as a whole. When clusters are internally homogeneous and differ sharply from each other, as tends to be true for human populations clustered by area of residence, CS is considerably less efficient than SRS (Kalton 1983, pp. 30–33). In this situation, researchers usually attempt to select only a few cases from each of many clusters, but that strategy eliminates the cost savings of CS.
Multistage Sampling. Researchers who want to collect data through face-to-face interviews with a probability sample of people living in a certain area, such as the United States, a state, or even a city, usually combine elements of the procedures discussed above in a multistage sampling procedure. For example, to draw a probability sample of U.S. adults, one might begin by obtaining a list of counties and parishes in the United States and collecting data on several characteristics of those units (region, average household income, etc.). These variables can be used to group the units, called primary sampling units, into strata so that one can use StRS. In addition, one would obtain estimates of the number of residents in each unit so that they could be sampled with probabilities proportional to their estimated population sizes (Sudman 1976, pp. 134–50). After selecting a sample of counties in this fashion, the researcher might proceed to draw a series of nested cluster samples. For example, one could divide each selected county into subareas (perhaps townships or other area-based governmental divisions) and then select a cluster sample from these units, with probabilities once again proportional to estimated population size. Next the researcher might divide each of the selected units into subareas (perhaps on the order of the U.S. Bureau of the Census's "blocks") and draw a cluster sample of them. For each chosen block, the researcher might obtain a list of dwelling units and draw another cluster sample. Finally, from each chosen dwelling unit the researcher would choose, according to a specified procedure (Kish 1995, pp. 396–404), an individual to be interviewed. It is crucial that the selection procedure at each stage of the sampling process be based on a formally random selection procedure. For more detailed discussions and examples of the selection of multistage sampling procedures, see Kish (1995, pp. 301–383), Moser and Kalton (1972, pp. 188–210), and Sudman (1976, pp. 131–170). Multistage sampling usually requires considerable resources and expertise, and those who wish to draw such samples should contact a survey organization. Studies of the design effects of multistage samples, such as those carried out by the University of Michigan's Survey Research Center, show that they usually vary from 1.0 to 2.0, with values around 1.5 being common (Kish 1995, p. 581). A design effect of 1.5 means that the standard error of a statistic is twenty-two percent larger than estimated by standard statistics programs, which assume simple random sampling. There is also variation across kinds of statistics, with univariate statistics, such as the mean, often having larger design effects than do bivariate statistics, such as regression coefficients (Groves 1989, pp. 291– 292). Unfortunately, estimating standard errors for a multistage sample is usually a complicated task, and this complexity, combined with the fact that popular statistics programs for computers use only SRS formulas, has led most researchers to ignore the problem, producing many spurious "statistically significant" findings.
RECENT ADVANCES
Sampling practitioners have made considerable progress in developing techniques for drawing probability samples of rare or elusive populations for which there are no lists and for which conventional multistage sampling procedures would produce sufficient cases only at an exorbitant cost. Sudman et al. (1988) review procedures for screening clusters to determine those that contain concentrations of a rare population's members and also discuss how multiplicity sampling procedures and capture-recapture methods can be applied to this problem. Researchers also have begun to use multiplicity sampling of individuals to draw probability samples of businesses and other social organizations to which individuals belong; Sudman et al. (1988) outline the general strategy involved, and Parcel et al. (1991) provide an informative discussion and an example. This approach also can produce "linked micro-macro samples" that facilitate contextual analyses.
Recent developments in statistical theory and computer software promise to make the calculation of standard errors for statistics based on multistage samples much easier. One approach to overcoming these difficulties is to use a computer program to draw many subsamples from an existing sample and then derive an overall estimate of a standard error from the many estimates given by the subsamples. There are several versions of this general approach, including "bootstrapping," "jackknife replication," and "cross-validation" (Hinkley 1983). A second approach is to develop computer statistical packages that incorporate information about the sampling design of a study (Wolter 1985, pp. 393–412, contains a list of such programs). The increased availability of such programs should produce greater recognition of the need to take a study's sampling procedure into account in analyzing the data the study yields.
There is now greater recognition that sampling error is just one of many types of error to which studies of human populations are subject. Nonsampling errors, including nonresponse error, interviewer error, and measurement error, also affect the accuracy of surveys. Groves (1989) comprehensively discusses both sampling and nonsampling errors and argues that minimizing one type can increase the other. Thus, decisions about sampling procedures need to take into account likely sources and magnitudes of nonsampling errors.
references
Choldin, Harvey M. 1994 Looking for the Last Percent: TheControversy over Census Undercounts. New Brunswick, N.J.: Rutgers University Press.
Groves, Robert M. 1989 Survey Errors and Survey Costs. New York: Wiley.
Hinkley, David 1983 "Jackknife Methods." In Samuel Kotz and Norman L. Johnson, eds., Encyclopedia ofStatistical Sciences, vol. 4. New York: Wiley.
Kalton, Graham 1983 Introduction to Survey Sampling. Newbury Park, Calif.: Sage.
Kish, Leslie 1995 Survey Sampling. New York: Wiley.
Moser, Claus A., and Graham Kalton 1972 Survey Methods in Social Investigation, 2nd ed. New York: Basic.
Parcel, Toby L., Robert L. Kaufman, and Leanne Jolly 1991 "Going Up the Ladder: Multiplicity Sampling to Create Linked Macro-to-Micro Organizational Samples." In Peter V. Marsden, ed., Sociological Methodology, vol. 21. Oxford, UK: Basil Blackwell.
Sudman, Seymour 1976 Applied Sampling. New York: Academic Press.
——, Monroe G. Sirken, and Charles D. Cowan 1988 "Sampling Rare and Elusive Populations." Science 240:991–996.
Wolter, Kirk M 1985 Introduction to Variance Estimation. New York: Springer-Verlag.
Lowell L. Hargens