Sample Surveys
Sample Surveys
I. The FieldW. Edwards Deming
II. Nonprobability SamplingAlan Stuart
There is hardly any part of statistics that does not interact in some way with the theory or the practice of sample surveys. The differences between the study of sample surveys and the study of other statistical topics are primarily matters of emphasis.
The field of survey research is closely related to the statistical study of sample surveys [see Survey analysis]. Survey research is more concerned with highly multivariate data and complex measures of relationship; the study of sample surveys has emphasized sampling distributions and efficient design of surveys.
I THE FIELD
The theory of sample surveys is mathematical and constitutes a part of theoretical statistics. The practice of sample surveys, however, involves an intimate mixture of subject matter (such as demoggraphy, psychology, consumer research, medicine, engineering) with theory. The germ of a study lies in the subject matter. Translation of a substantive question into a stimulus (question or test) enables man to inquire of nature and to quantify the result in terms of estimates of what the same inquiry would produce were it to cover every unit of the population.
Sampling, properly applied, does more. It furnishes, along with an estimate, an index of the precision thereof—that is, a margin of the uncertainty, for a stated probability, that one may reasonably ascribe to accidental variations of all kinds, such as variability between units (that is, between households, blocks, patients), variability of the interviewer or test from day to day or hour to hour, variations in coding, and small, independent, accidental errors in transcription and card punching.
The techniques of sampling also enable one to test the performance of the questionnaire and of the investigators and to test for differences between alternative forms of the questionnaire. They enable one to measure the extent of under-coverage or over-coverage of the prescribed units selected and also to measure the possible effects of differences between investigators and of departures from prescribed rules of interviewing and coding.
This article describes probability sampling, with special reference to studies of human populations, although the same theory and methods apply to studies of physical materials, to accounting, and to a variety of other fields. The main characteristic of probability sampling is its use of the theory of probability to maximize the yield of information for an allowable expenditure of skills and funds. Moreover, as noted above, the same theory enables one to estimate, from the results themselves, margins of uncertainty that may reasonably be attributed to small, accidental, independent sources of variation. The theory and practice of probability sampling are closely allied to the design of experiments.
The principal alternatives to probability sampling are judgment sampling and convenience sampling [see Sample Surveys, article onnon-probability sampling].
Uses of sampling
Probability sampling is used in a wide variety of studies of many different kinds of populations. Governments collect and publish monthly or quarterly current information in such areas as employment, unemployment, expenditures and prices paid by families for food and other necessaries, and condition and yield of crops.
In modern censuses only basic questions are asked of every person, and most census information is elicited for only a sample of the people, such as every fourth household or every twentieth. Moreover, a large part of the tabulation program is carried out only on a sample of the information elicited from everyone.
Sampling is the chief tool in consumer research. Samples of records, often supplemented by other information, furnish a basis on which to predict the impact that changes in economic conditions and changes in competitive products will have on a business.
Sampling is an important tool in supervision and is helpful in many other administrative areas, such as studies of use of books in a library to improve service and to make the best use of facilities.
Sampling—what is it?
Everyone acquires information almost daily from incomplete evidence. One decides on the basis of the top layer of apples in a container at the fruit vendor’s whether to buy the whole container. The top layer is a good sample of the whole if the apples are pretty well mixed; it is a bad sample and may lead to a regrettable purchase if the grocer has put the best ones on top.
The statistician engaged in probability sampling takes no chances on inferences drawn exclusively from the top layer or from any other single layer. He uses random numbers to achieve a standard degree of mixing, thereby dispersing the sample throughout the container and giving to every sampling unit in the frame an ascertainable probability of selection [seeRandom Numbers]. He may use powerful techniques of stratification, ratio estimation, etc., to increase accuracy and to decrease costs. For instance, in one type of stratified sampling he in effect divides the container of apples into layers, mixes the apples in each layer, and then takes a sample from each layer.
Some history of sampling
Sir Frederick Morton Eden estimated the number of inhabitants of Great Britain in 1800 at nine million, using data on the average number of people per house in selected districts and on the total number of houses on the tax-rolls, with an allowance for houses not reported for taxation. The first census of Great Britain, in 1801, confirmed his estimate. Messance in 1765 and Moheau in 1778 obtained estimates of the population of France by multiplying the ratio of the number of inhabitants in a sample of districts to the number of births and deaths therein by the number of births and deaths reported for the whole country. Laplace introduced refinements in 1786 and calculated that 500,000 was the outside margin of error in his estimate of the population of France, with odds of 1,161 : 1. His estimate and its precision were more successful than those of the complete census of France that was attempted at the same time. [SeeLaplace.]
A. N. Kiaer used systematic selection in a survey of Norwegian workers in 1895, as well as in special tabulations from the census of Norway in 1900 and from the census of Denmark in 1901 and in a study of housing in Oslo in 1913.
Bowley in 1913 used a systematic selection of every twentieth household of working-class people in Reading (England) and computed standard errors of the results.
Tabulation of the census of Japan in 1921, brought to a halt by the earthquake of 1923, went forward with a sample consisting of the records of every thousandth household. The results agreed with the full tabulation, which was carried out much later. The Swedish extraordinary census of 1935 provides a good example of the use of sampling in connection with total registrations.
One strong influence on American practice came in the 1930s from Margaret H. Hogg, who had worked under Bowley. Another came when controversies over the amount of unemployment during the depressions of 1921 and 1929 called for improved methods of study—Hansen’s sample of postal routes for estimates of the amount of unemployment in 1936 gained recognition for improved methods; without it the attempt at complete registration of unemployed in the United States at the same time would have been useless.
Mahalanobis commenced in 1932 to measure the yield of jute in Bengal and soon extended his surveys to yields of rice and of other crops. In 1952 all of India came under the national surveys, the scope of which included social studies and studies of family budgets, sickness, births, and deaths. Meanwhile, the efforts of statisticians, mainly in India and England, had brought advances in methodology for estimation of yield per acre by random selection of small plots to be cut and harvested.
A quarterly survey of unemployment in the United States, conducted through interviews in a sample of households within a sample of counties, was begun in 1937. It was soon made monthly, and in 1942 it was remodeled much along its present lines (Hansen et al. 1953, vol. 1, chapter 9).
Sampling was used in the census of the United States in 1940 to extend coverage and to broaden the program of tabulation and publication. Tabulation of the census of India in 1941 was carried out by a 2 per cent sample. Subsequent censuses in various parts of the world have placed even greater dependence on sampling, not only for speed and economy in collection and tabulation but also for improved reliability. The census of France used sampling as a control to determine whether the complete Census of Commerce of 1946 was sufficiently reliable to warrant publication; the decision was negative (Chevry 1949). [For further history, see Stephan 1948. Some special references to history are contained in Deming (1943) 1964, p. 142. See alsoStatistics, article onthe history of statistical method.]
Misconceptions about sampling
Sampling, of course, possesses some disadvantages: it does not furnish detailed information concerning every individual person, account, or firm; furthermore, error of sampling in very small areas and subclasses may be large. Many doubts about the value of sampling, however, are based on misconceptions. Some of the more common misconceptions will now be listed and their fallacies pointed out.
It is ridiculous to think that one can determine anything about a population of 180 million people, or even 1 million people, from a sample of a few thousand. The number of people in the country bears almost no relation to the size of the sample required to reach a prescribed precision. As an analogy (suggested by Tukey), consider a basket of black and white beans. If the beans are really mixed, a cupful would determine pretty accurately the proportion of beans that are black. The cupful would still suffice and would give the same precision for a whole carload of beans, provided the beans in the carload were thoroughly mixed. The problem lies in mixing the beans. As has already been noted, the statistician accomplishes mixing by the use of random numbers.
Errors of sampling are a hazard because they are ungovernable and unknown. Reliability of a sample is a matter of luck. Quality and reliability of data are built in through proper design and supervision, with aid from the theory of probability. Uncertainty resulting from small, independent, accidental errors of a canceling nature and variation resulting from the use of sampling are in any case determinable afterward from the results themselves.
Errors of sampling are the only danger that one has to worry about in data. Uncertainty in statistical studies may arise from many sources. Sampling is but one source of error. [See below, and see also Errors, article onnonsampling errors].
Electronic data-processing machines, able to store and retrieve information on millions of items with great speed, eliminate any need of sampling. This is a fanciful hope. The inherent accuracy of original records as edited and coded is the limitation to the accuracy that a machine can turn out. Often, complete records are flagrantly in error or fail to contain the information that is needed. Moreover, machine-time is expensive; sampling reduces cost by reducing machine-time.
A ”complete” study is more reliable than a sample. Data are the end product of preparation and of a long series of procedures—interviewing, coding, editing, punching, tabulation. Thus, error of sampling is but one source of uncertainty. Poor workmanship and structural limitations in the method of test or in the questionnaire affect a complete count as much as they do a sample. It is often preferable to use funds for improving the questionnaire and tests rather than for increasing the size of the sample.
Statistical parts of sampling procedure
A sampling procedure consists of ten parts. In the following list, M will denote those parts that are the responsibility of the expert on the subject matter, and S will denote those that are the responsibility of the statistician. (The technical terms used will be defined below.)
(a) Formulation of the problem in statistical terms (probability model) so that data will be meaningful (M, S). A problem is generated by the subject matter, not by statistical theory.
(b) Decision on the universe (M). The universe follows at once from a careful statement of the problem.
(c) Decision on the frame (M, S). Decision on the type and size of sampling units that constitute the frame (S).
(d) Procedure for the selection of the sample (S).
(e) Procedure for the calculation of estimates of the characteristics desired (averages, totals, proportions, etc.) (S).
(f) Procedure for the calculation of standard errors (S).
(g) Design of statistical controls, to permit detection of the existence and extent of various non-sampling errors (S).
(h) Editing, coding, tabulation (M, S). (i) Evaluation of the statistical reliability of the results (S).
(j) Uses of the data (M).
Definitions of terms
The technical terms that have been used above and that will be needed for further discussion will now be defined.
Universe of study
The universe consists of all the people, firms, material, conditions, units, etc., that one wishes to study, whether accessible or not. The universe for any study becomes clear from a careful statement of the problem and of the uses intended for the data. Tabulation plans disclose the content of the universe and of the information desired. Examples of universes are (i) the housewives aged 20-29 that will live in the Metropolitan Area of Detroit next year, (ii) all the school children in a defined area, (Hi) all the pigs in a country, both in rural areas and in towns.
Frame
The frame is a means of access to the universe (Stephan 1936) or to enough of the universe to make a study worthwhile. A frame is composed of sampling units. A sampling unit commonly used in house-to-house interviewing is a compact group or segment of perhaps five consecutive housing units. A frame is often a map, divided up—either explicitly or tacitly—into labeled areas. In a study concerned with professional men, for example, the frame might be the roster of membership of a professional society, with pages and lines numbered. The sampling unit might be one line on the roster or five consecutive lines.
Without a frame probability sampling encounters numerous operational difficulties and inflated variances (see, for example, the section “Sampling moving populations,” below).
In the types of problems to be considered here (with the exception of those treated in the section “Sampling moving populations,” below) there will be a frame, and every person, or every housing unit, will belong to one sampling unit, or will have an ascertainable probability of belonging to it. In the sampling of stationary populations, a sampling procedure prescribes rules by which it is possible to give a serial number to any sampling unit, such as a small area. A random number will then select a definite sampling unit from the frame and will lead to investigation of all or a subsample of the material therein that belongs to the universe.
Selection of persons within a dwelling unit
Some surveys require information concerning individuals, and in such cases it may be desirable, for various reasons (contagion, fatigue, and so on), to interview only one eligible person in a dwelling unit that lies in a selected segment. In such surveys, the interviewer may make a list of the eligible people in each dwelling unit that falls in the sample and may select therefrom, on the spot, by a scheme based on random numbers, one person to interview. Appropriate weights are applied in tabulation (Deming 1960, p. 240).
Nominal frame and actual frame
One must often work with a frame that fails to include certain areas or classes that belong to the universe. A list of areas that contain normal families may not lead to all the consumers of a product, as some consumers may live in quasi-normal quarters, such as trailers and dormitories. Extension of the sampling procedure into these quarters may present problems. Fortunately, the proportion of people in quasi-normal households is usually small (mostly 1 per cent to 3 per cent in American cities), and one may therefore elect to omit them.
A frame may be seriously impaired if it omits too much of certain important classes that by definition belong to the nominal frame. It is substantive judgment, aided by calculation, that must decide whether a proposed frame is satisfactory.
Sampling from an incomplete frame
Almost every frame is in some respects out of date at the time of use. It is often possible, however, to use an obsolete or incomplete frame in a way that will erase the defects in the areas that fall into the sample. One may, for example, construct rules by which to select large sampling units from an incomplete frame and then to amend those units, by local inquiry, in order to bring them up to date. Selection of small areas within the larger area, with the appropriate probability, will maintain the prescribed over-all probability of selection.
Sampling for rare characteristics
One sometimes wishes to study a rare class of people when there is no reliable list of that class. One way to accomplish this is to carry out a cheap, rapid test in order to separate a sample of households into two groups (strata)—one group almost free of the rare characteristic, the other heavily populated with it—and then to investigate a sample drawn from each group. Optimum sampling fractions and weights for consolidation may be calculated by the theory of stratified sampling (discussed below; see also Kish in Symposium …, 1965).
Equal complete coverage of a frame
The equal complete coverage of a frame is by definition the result that would be obtained from an investigation of all sampling units in a given frame, carried out by the same field workers or inspectors, using the same definitions and procedures, and exercising the same care as they exercised on the sample, and at about the same period of time. The adjective “equal” signifies that the same methods must be used for the equal complete coverage as for the sample.
Some operational definitions
Sampling error. Suppose that for a given frame, sampling units bear the serial numbers 1, 2, 3, and on to N. However it be carried out, and whatever be the rules for coding and for adjustment for nonresponse, a complete coverage of the N sampling units of the frame would yield the numerical values
a1, a2, a3, …, aN for x,
b1, b2, b3, …, bN for y.
In a survey of unemployment, for example, the x-characteristic of a person might be the property of being unemployed and his y-characteristic the property of belonging to the labor force. Then a,, the x-population of sampling unit No. 1 (which might consist of five successive households), would be the count of people that have the x-characteristic in that sampling unit. That is, a1 would be the count of unemployed persons in the five households. Similarly, b1, the y-population, would be the count of people in the labor force in those same households. Then a1/b1 would be the proportion unemployed in the sampling unit of five households.
Again, x might refer to expenditure for bread and y to expenditure for all food. Then a1/b1 would be the proportion of money that goes for bread in sampling unit No. 1, expenditure for all food being the base.
Here, the people with the x-characteristic form a subclass of those with the y-characteristic, but this may not be so in other surveys. Thus, the x-characteristic and the y-characteristic might form a dichotomy, such as passed and rejected or male and female. One often deals with multiple characteristics, but two will suffice here.
Denote the sum of the x-values and of the y-values in the N sampling units by
A = a1 + a2 + a3 + ··· + aN = Na = x-total,
B = b1 + b2 + b3 + ··· + bN = Nb = y-total,.
which makes a and b the average x-value and the average y-value per sampling unit in the frame, as in Table 1. For example, A might be the total number unemployed in the whole frame and B the total number of people in the labor force. Then ϕ = A/B would be the proportion of people in the labor force that are unemployed.
An operational definition of the sampling process and of the consequent error of sampling is contained in the following experiment.
(a) Take for the frame N cards, numbered serially 1 to N. Card i shows ai and bi for the values of the x-characteristic and y-characteristic.
(b) Draw a sample of n cards, following the specified sampling procedure (which will invariably require selection by random numbers).
Table 1 illustrates the notation for the frame and for the results of a sample. The serial numbers on the cards in the sample are not their serial numbers in the frame but denote instead the ordinal number as drawn by random numbers. Sample card No. 1 could be any card from 1 to N in the frame. In general, another sample would be composed of different cards, as the drawings are random.
(c) Form estimates by the formulas specified in the sampling plan. For illustration, one may form, from the sample, estimators like
Table 1 — Some notation for frame and sample | |||||
---|---|---|---|---|---|
Serial number of sampling unit | FRAME | Serial number in order drawn in sample | SAMPLE | ||
*Some authors define variances by means of N-1 and n-1 rather than N and n. | |||||
x-value | y-value | x-value | y-value | ||
1 | a1 | b1 | 1 | x1 | y1 |
2 | a2 | b2 | 2 | x2 | y2 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
N | aN | bN | n | xn | yn |
Total | A | B | x | y | |
Average per sampling unit | a=A/N | b=B/N | x̄=x/n | ȳ=y/n | |
Variance* | |||||
Standard deviation | σa | σb | Sx | Sy |
If (1), (2), (3), (4), and (5) are used as estimators of a, b, A, B, and j, respectively, and if the results of the complete coverage were known, then one could, for any experiment, compute errors of sampling, such as
It is an exciting fact that a single sample— provided that it is big enough (usually 25, 30, or more sampling units), and provided that it is designed properly and skillfully in view of possible statistical peculiarities of the frame and is carried out in reasonable conformance with the specifications—will make possible an estimate, based on theory to follow later, of the important characteristics of the distribution of sampling variation of all possible samples that could be drawn from the given equal complete coverage and processed by the specified sampling procedure.
Standard error and mathematical bias
We continue our conceptual experiment.
(d) Return the sample of n cards to the frame, and repeat steps (b) and (c) by the same sampling procedure, to form a new sample and new estimates x̄, ȳ, f̄. Repeat these steps again and again, 10,000 times or more.
Explicit statements will now be confined to x. The 10,000 experiments give an empirical distribution for x̄, by which one may count the number of samples for which x̄ lies between, for example, 100 and 109. We visualize an underlying theoretical distribution of x̄, which the empirical distribution approaches closer and closer as the number of repetitions increases.
We are typically concerned with relationships between (i) the empirical distribution of x̄ and (ii) the theoretical distribution of x̄, for the given sampling procedure. Study of these relationships helps in the use of sampling for purposes of making estimates of characteristics of the frame.
Let a be the characteristic of the complete coverage that the generic symbol x̄ estimates. Then if
the sampling procedure is said to be unbiased. (The symbol E denotes expectation, the mean of the theoretical distribution of x̄.) But if
the sampling procedure has the mathematical bias C. In any case, the variance of the distribution of x̄ is
and its square root, σx̄, is the standard error of the sampling procedure for the estimator x̄. Thus, a sampling procedure has, for any estimator, an expected value, a standard error, and possibly a mathematical bias (see the section “Possible bias in ratio estimators,” below).
Uncertainty from accidental variation
Under the conditions stated above, the margin of uncertainty in the estimator x̄ that is attributable to sampling and to small, independent, accidental variations, including random error of measurement (Type in in the next section), may be estimated, for a specified probability, as tσ̂x̄, where σ̂x̄ an estimator of σ̂x̄. The factor t depends on the probability level selected for the margin of uncertainty (which will in turn depend on the risks involved) and also on the number of degrees of freedom in the estimator σ̂x̄. In large samples the distributions of most estimators are nearly normal, except for frames that exhibit very unusual statistical characteristics. The standard deviation, σ̂x̄, then contains nearly all the information regarding the margin of uncertainty of σ̂x̄ that is attributable to accidental variation. Presentation of the results of a survey requires careful consideration when there is reason to question the approximate normality of estimators (Fisher 1956, p. 152; Shew-hart 1939, p. 106).
Random selection
It is never safe to assume, in statistical work, that the sampling units in a frame are already mixed. A frame comes in layers that are different, owing to geographic origin or to order of production. Even blood, for example, has different properties in different parts of the body.
A random variable is the result of a random operation. A system of selection that depends on the use, in a standard manner, of an acceptable table of random numbers is acceptable as a random selection. Methods of selection that depend on physical mixing, shuffling, drawing numbers out of a hat, throwing dice, are not acceptable as random, because they have no predictable behavior. Neither are schemes that merely remove the choice of sampling units from the judgment of the interviewer. Pseudo-random numbers, generated under competent skill, are well suited to certain types of statistical investigation [seeRandom Numbers].
Types of uncertainty in statistical data
All data, whether obtained by a complete census or by a sample, are subject to various types of uncertainty. One may reduce uncertainties in data by recognizing their existence and taking steps for improvement in future surveys. Sample design is an attempt to strike an economic balance between the different kinds of uncertainty. There is no point, for example, in reducing sampling error far below the level of other uncertainties.
Three types of uncertainty
The following discussion will differentiate three main types of uncertainty.
Type I
Uncertainty of Type I comprises built-in deficiencies, or structural limitations, of the frame, questionnaire, or method of test.
Any reply to a question, or any record made by an instrument, is only a response to a stimulus. What stimulus to apply is a matter of judgment. Deficiencies in the questionnaire or in the method of test may therefore arise from incomplete understanding of the problem or from unsuitable methods of investigation. Structural limitations are independent of the size or kind of sample. They are built in: a recanvass will not discover them, nor will calculation of standard errors or other statistical calculations detect them.
Some illustrations of uncertainty of Type I are the following:
(a) The frame may omit certain important segments of the universe.
(b) The questionnaire or method of test may fail to elicit certain information that is later found to be needed. The questionnaire may contain inept definitions, questions, and sequences. Detailed accounting will give results different from those given by mere inquiry about total expenditure of a family for some commodity; date of birth gives a different age from that given in answer to the simple question, How old are you? There may be differential effects of interviews depending on such variables as sex and race of the interviewer.
(c) Use of telephone or mail may yield results different from those obtained by personal interview.
(d) Judgments of coders or of experts in the subject matter may differ.
(e) The date of the survey has an important effect on some answers.
Type II
Uncertainty of Type II includes operational blemishes and blunders—for example:
(f) One must presume the existence of errors of a noncanceling nature (persistent omission of sampling units designated, persistent inclusion of sampling units not designated, persistent favor in recording results).
(g) One must presume the existence of bias from nonresponse.
(h) Information supplied by coders for missing or illegible entries may favor high or low values.
(i) There may be a large error, such as a unique blunder.
Type III
Uncertainty of Type in is caused by random variation. Repeated random samples drawn from the same frame will give different results. Besides, there are inherent uncorrelated, nonper-sistent, accidental variations of a canceling nature that arise from inherent variability of investigators, supervisors, editors, coders, punchers, and other workers and from random error of measurement.
Standard error of an estimator
The standard error of a result includes the combined effects of all kinds of random variation, including differences within and between investigators, supervisors, coders, etc. By proper design, however, it is possible to get separate estimates of some of these differences.
A small standard error of a result signifies (i) that the variation between repeated samples will be small and (ii) that the result of the sample agrees well with the equal complete coverage of the same frame. It usually tells little about uncertainties of Type II and never anything about uncertainties of Type I.
Limitations of statistical inference
Statistical inference (estimates, standard errors, statistical tests) refers only to the frame that was sampled and investigated. No statistical calculation can, by itself, detect or measure nonsampling errors, although side experiments or surveys may be helpful. No statistical calculation can detect defects in the frame. No statistical calculation can bridge the gap between the frame covered and the universe. This is as true of probability sampling as it is of judgment sampling, and it is true for a complete census of the frame as well.
Comparison of surveys
Substantial differences in results may come from what appear to be inconsequential differences in questionnaires or in methods of hiring, training, and supervision of interviewers and coders or in dates of interviewing. The sampling error in a sample is thus not established by comparison against a complete census unless the complete census is the equal complete coverage for the sample.
Recalls on people not at home
Many characteristics of people that are not at home at first call, or that are reluctant to respond, may be very different from the average. What is needed is response from everyone selected, including those that are hard to get. To increase the initial size of the sample is no solution. Calculations that cover a wide variety of circumstances show that the amount of information per dollar expended on a survey increases with the number of recalls, the only practicable limit being the time for the completion of the survey. Good sample design therefore specifies that four to six well-timed recalls be made or specifies that recalls continue until the level of response reaches a prescribed proportion. Special procedures, such as intensive subsampling of those not at home on the first or second call, have been proposed (see Leven 1932; Hansen & Hurwitz 1946; Deming 1960).
Surveys by post
One can often effect important economies by starting with a mail survey of a fairly large sample properly drawn from a given frame, then finishing with a final determined effort in the form of personal interviews on all or a fraction (one in two or one in three) of the people that failed to reply (Leven 1932). Mail surveys require a frame, in the form of a list of names with reasonably accurate addresses, and provision for keeping records of mailings and of returns.
They are therefore especially adaptable to surveys of members of a professional society, subscribers to a journal, or subscribers to a service. [For further discussion of mail surveys, seeErrors, article Onnonsampling errors.]
Simple designs for enumerative purposes
The aim in an enumerative study is to count the number of people in an area that have certain characteristics or to estimate a quantity, perhaps their annual income, regardless of how they acquired these characteristics. The aim in an analytic study is to detect differences between classes or to measure the effects of different treatments.
For illustration consider a study of schizophrenics. One enumerative aim might be to estimate the number of children born to schizophrenic parents before onset of the disease or before the first admission of one of the parents to a hospital for mental diseases. Further aims of the same study might be analytic, such as to discover differences in fertility or in duration of hospitalization caused by different treatments, differences between communities, or differences between time periods.
The finite multiplier typified by 1/n— 1/N (to be seen later) appears in estimators for enumerative purposes. It has no place in estimators for analytic purposes.
Optimum allocation of effort for an enumerative aim may not be optimum for an analytic aim. Moreover, what is optimum for one enumerative characteristic may not be optimum for another. Hence, it will usually be necessary to compromise between competitive aims.
Enumerative aims will occupy most of the remaining space in this article.
The theory presented in this section is for the design commonly called simple random sampling. This is often a practicable design, and the theory forms a base for more complex designs.
A simple procedure of selection and some simple estimators
Definitions of “frame,” “sample,” and other terms were introduced above. In addition, it will be convenient to define the coefficient of variation. For the x-population and ^-population of the frame, the coefficients of variation are defined as
In like manner, the symbol Cx denotes the coefficient of variation of the empirical or theoretical distribution of the random variable x. The square, , of the coefficient of variation Cx is called the rel-variance of x. The coefficient of variation is especially useful for characteristics (such as height) that are positive. It is often helpful to remember, for example, that Cx̄ = CX = Cx because x, x̄, and X are constant multiples of each other.
The procedure of selection specified earlier gives every member of the frame the same probability of selection as every other member, wherefore
That is, x̄ and ȳ are unbiased estimators of a and b, respectively. Moreover,
are unbiased estimators of A and B. Often, a ratio such as
is of special interest. The sample gives
as an estimator of ϕ. If the total y-population, B, is known from another source, such as a census, A may be estimated by the formula
This estimator X’ is called a ratio estimator. It will be more precise than the estimator X = Nx̄ in (14) if the correlation between xi and yi is high. Other estimators will be discussed later (for example, regression estimators). Theory provides a basis for the choice of estimator.
Possible bias in ratio estimators
Necessary and sufficient conditions for there to be no bias in f as an estimator of ϕ are that Ey ≠ 0 and that xi/yi and yi be uncorrelated—that is, that E[(x/y)y] = E(x/y)Ey. In practice, if bias exists at all, it is usually negligible when the sample contains more than three or four sampling units.
Sampling with and without replacement
Usually, in the sampling of finite populations, one permits a sampling unit to come into the sample only once. In statistical language, this is sampling without replacement. Tests of physical materials are sometimes destructive, and a second test would be impossible. To draw without replacement, one simply disregards a random number that appears a second time (or uses tables of so-called random permutations).
There are circumstances, however, in which one accepts the random numbers as they come and permits a sampling unit to come into the sample more than once. This is sampling with replacement.
Hereafter, most equations will be written for sampling without replacement. It is a simple matter to drop the fraction 1/N from any formula to get the corresponding formula for sampling with replacement. Actually, in practice, samples are usually such a small part of the frame that the fraction 1/N is ignored, even though the sampling be done without replacement.
Variances
The variances of the estimator x̄ derived from the sampling procedure described earlier are
(The sign ≅ indicates an approximation that is sufficiently close in most practice.)
Similar expressions hold for ȳ. For the ratio f = x̄/ȳ, the approximation
is useful if n be not too small. Here
is the rel-covariance of the x-population and y-population per sampling unit in the frame.
When the ratio estimator of the total x-population is derived as in eq. (17), eq. (19) gives the same approximation for
Estimate of aggregate characteristic—number of units in class unknown
It often happens in practice that one wishes to estimate the aggregate value of some characteristic of a subclass of a group when the total number of units in the subclass is unknown. For example, one might wish to estimate the aggregate income of women aged 15 or over that live in a certain district, are gainfully employed, and have at least one child under 12 years old at home (this specification defines the universe). The number of women that meet this specification is not known. An estimate of the average income per woman of this specification, prepared from a sample, suffers very little from this gap in available knowledge, but an estimate of the total income of all such women is not so fortunate.
As an illustration, suppose that the frame is a serialized list of N women aged 15 or over and that the sample is a simple random sample of n of these women, drawn with replacement by reading out n random numbers between 1 and N. Information on the n women is collected, and it is noted which ones belong to the specified subclass—that is, which ones live in a certain district, are gainfully employed, and have at least one child under 12 years old at home. Suppose that this number is n” and that the average income of the ns, women is x̄s. Of course, ns is a random variable with a binomial distribution.
What is the rel-variance of x̄s? Let be the rel-variance between incomes of the women in the frame that belong to the subclass. It is a fact that the conditional rel-variance of x̄s, for samples of size ns of the specified subclass, will be , just as if the women of this subclass had been set off beforehand into a separate stratum and a sample of size ns had been drawn from it.
The conditional expected value of x̄s over all samples of fixed size ns in the subclass has moreover the convenient property of being the average income of all the women in the frame that belong to this subclass. It is for this reason that the conditional rel-variance of x̄s is useful for assessing the precision of a sample at hand. For purposes of design, one uses the rel-variance of x̄s over all samples of size n, which is or very nearly where P is the proportion of all women 15 or over that meet the specification of the subclass, and P + Q = 1.
In contrast, any estimator, Xs, of the aggregate income of all the women in the specified subclass will not have such convenient properties as x̄s,. The conditional expectation of Xs, for samples of size ns, is not equal to the aggregate income of all the women in the frame that belong to the subclass. The conditional rel-variance of Xs for a sample of size ns at hand, although equal to the conditional rel-variance of x̄s, therefore requires careful interpretation. Instead of attempting to interpret the conditional rel-variance of Xs, one may elect to deal with the variance of Xs in all possible samples of size n. Thus, if Xs is set equal to (N/n)nsx̄s (here N/n is used as an expansion factor equal to the reciprocal of the probability of selection), it is a fact that the rel-variance of Xs over all samples of size n will be (see Deming 1960, p. 129).
The problem with Xs arises from the assumption that Ns, the number of women in the frame that meet the specification of the subclass, is unknown. If Ns were known, one could form the estimator Xs = Nsx̄s, which would have all the desirable properties of x̄s.
One way to reduce the variance of the total income, X,, of the specified class is (1) to select from the frame a large preliminary sample, (2) by an inexpensive investigation to classify the units of the preliminary sample into two classes, those that belong to the specified class and those that do not, (3) to investigate a sample of the units that fell into the specified class, to acquire information on income. The preliminary sample provides an estimate of Ns, and the final sample provides an estimate of x̄s. The product gives the estimate Xs = Nsx̄s for the total income in the specified class. (For the variance of Xs and for optimum sizes of samples, see Hansen, Hurwitz, & Madow, 1953, vol. 1, pp. 65 and 259.)
If, further, N were not known and only the probability, π, of selection, to be applied to every sampling unit in the frame, were known, both n and ns will be random variables, and there will be a further inflation of the rel-variance of any estimator of the aggregate income of all the women in the specified subclass. Thus, if Xs be set equal to nsx̄s/π for such an estimator, then the unconditional rel-variance of Xs will be . The conditional rel-variance of xs, however, is still .
It may be noted that for a small subclass there is little difference between and .
Examples are common. Thus, one might read out a two-digit random number for each line of a register, following the rule that the item listed on a line will be drawn into the sample if the random number is 01. If counts from outside sources are not at hand or are not used, then the rel-variance of an estimator, Xs, of the total number or total value of any subclass of items on the register contains the factor .
Use of thinning digits
Reduction of the probability of selection of units of specified characteristics (such as items of low value) through the use of thinning digits may produce either the factor or the factor in the rel-variance of an estimator of an aggregate, depending on the mode of selecting the units.
Estimates of variances
Estimates of variances are supplied by the sample itself, under proper conditions, as was discussed above. Some of the more important estimators follow, denoted by a circumflex ( ̂ ). For the variance of x̄,
with a similar expression for . For the covariance,
Eqs. (21) and (22), with N infinite, were developed by Gauss (1823). These estimators are unbiased; is a slightly biased estimator of σx̄ but the bias is negligible for n moderate or large.
Under almost all conditions met in practice, one may set
and compare this quantity with tabulated values of t to find the margin of uncertainty in x for any specified probability. Such calculations give excellent approximations unless the distribution of sampling units in the frame is highly skewed. Extreme skewness may often be avoided by stratification (discussed below).
A useful approximate estimator for the rel-variance of f = X/Y = x̄/ȳ is
This formula is derived by combination of eqs. (19), (21), and (22). In accordance with a previous remark, one may take Ĉx. = Ĉf, where X′ is the ratio estimator of A as given by (17).
Size of frame usually not important
Because of the way in which N enters the variances, the size of the frame has little influence on the size of sample required for a prescribed precision, unless the sample is 20 per cent or more of the total frame. For instance, the sample required to reach a specified precision would be the same for the continental United States as for the Boston Metropolitan Area, on the assumption that the underlying variances encountered are about the same for the entire United States as for Boston.
Special form for attributes (0,1 variate)
In many studies a sampling unit gives only one of two possible observations, such as yes or no, male or female, heads or tails. The above equations then assume a simple form.
If each person in a frame is a sampling unit, and if ai = 1 for yes, ai = 0 for no, then the total x-population, A, in the frame is the total number of yes observations that would be recorded in the equal complete coverage, and a is the proportion yes, commonly denoted by p. The variance between the ai in the frame is
where p + q = 1.
The random variate, xi will take the value 0 or 1;
will be the number of yes observations in the sample, and
will be the proportion yes in the sample. Replacement of x̄ by p̂ in previous equations shows that p is an unbiased estimator of the proportion yes in the frame and that
It is important to note that this variance is valid only if each sampling unit produces the value 0 or 1. It is not valid, for instance, for a sample of segments of area if there is more than one person per segment, or if the segments are clustered (as discussed below).
For an estimate of the variance of p̂ (provided the sampling procedure meets the conditions stated) one may use
where p̂ + q̂ — 1.
How good is an estimator of a variance?
The variance of the estimator in eq. (21) depends on the standardized fourth moment, β2, of the frame and on the number of degrees of freedom for the estimator. Thus, if one defines
then the rel-variance of the estimator . of eq. (21) will be (β2 - l)/n, which diminishes with n.
Systematic selection
A simple and popular way to spread the sample over the frame is to select every Zth unit, with a random start between 1 and Z, where Z = N/n. This is called systematic sampling with a single random start, and it is one form of patterned sampling. In certain kinds of materials, specifically those in which nearby sampling units are, on the average, more similar than units separated by a longer interval, systematic sampling will be slightly more efficient than stratified random sampling (Cochran 1946).
A disadvantage of systematic sampling with a single random start is that there is no strictly valid way to make a statistical estimate of the standard error of a result so obtained. This is because the single start is equivalent to the selection of only one sampling unit from the Z possible sampling units that could be formed. One may nevertheless, under proper conditions, get a useful approximation to the rel-variance by using the sum of squares of successive pairs. Eq. (21) with n = 2 and N = Z gives the estimator
where the summation runs over all pairs.
Hidden and unsuspected periodicities often turn up, and in such cases the above formula may give a severe underestimate or overestimate of the variance. For example, every nth household might be nearly in phase with the natural periodicity of income, rent, size of family, and other characteristics associated with corners and with the configuration of dwelling units within areas and within apartment houses. Systematic sampling of physical elements or of time intervals can lead to disaster.
A statistician will therefore justify a single random start and use of eq. (31) only if he has had long experience with a body of material.
Instead of a single random start between 1 and N/n, one may take two random starts between 1 and 2N/n and every (2N/rc)th sampling unit thereafter. Extension to multiple random starts is obvious. Two or more random starts give a valid estimate of the variance. Fresh random starts every six or eight zones will usually reap any possible advantage of systematic sampling and will avoid uncertainty in estimation of the variance.
Efficiency of design
The relative efficiency of two sampling procedures, I and n, that give normally distributed estimators of some characteristic is by definition the ratio of the inverses of the variances of these estimators for the same size, n, of sample. In symbols (E denotes efficiency),
This concept of efficiency is due to Fisher (1922). Comparison of costs is usually more important than comparison of numbers of cases. Let the costs be CI and cII for equal variances. Then
Comparison of efficiencies of estimators whose distributions depart appreciably from normality require special consideration.
Sampling moving populations
A possible procedure in sampling moving populations is to count and tag all the people visible from a number of enumerators’ posts through a period of a day or a week (the first round) and then to repeat the count from the same or different posts some time later (the second round). The n1 people counted and tagged in the first round constitute a mobile frame for the second round. If the number of people counted in the first round is n1, and if the number counted in the second round is n2, with an intersection of n12 for people counted in both rounds, then an estimator of the total number of mobile inhabitants in the whole area is N̂ = n1n2/n12 (Yates 1949, p. 43; Deming & Keyfitz 1967).
More complex designs
Considerations of cost—clustering
The total cost of a survey includes cost of preparing the frames and cost of travel to the units selected. In some surveys it may be possible to get more information per unit cost by enlarging the sampling unit, a procedure commonly called clustering. One may, for example, define a sampling unit as comprising all the dwellings in a compact segment of area. Further, one may, with experience and care, subsample dwelling units from a selected cluster or select one member of a family where two or more members qualify for the universe. Again, in a national survey, one may restrict the sample to a certain number of counties that will come into the sample by a random process. Or, in a survey of a city, one may restrict the sample to a random selection of blocks.
Any such plan reduces the interviewer’s expenses for travel and reduces the cost of preparing the frame. However, restriction of the sample usually also increases variances, unless the total number of households in the sample be increased as compensation. It should be remembered, though, that the actual precision obtained by the use of cluster sampling may be nearly as good as that obtained by an unrestricted random selection of the same number of dwelling units with no clustering.
Theory indicates the optimum balance between enlargement of the sampling unit and the number of sampling units to include in the sample. Obviously, the theory is more complex than that discussed in the last section. Stratification, ratio estimators, and regression estimators are additional techniques that, under certain conditions, yield further increases in efficiency (see below).
An example
The following illustration refers to a sample of a city: (i) Suppose that it has been determined in advance that for the main purposes of a survey the optimum size of areal unit is a compact group of five dwelling units, called a segment. (ii) A sampling unit within the city will consist of n segments from a larger number of segments contained in a block. The n segments of a sampling unit (if n̄ > 1) should be scattered over the block. A good way to effect this scatter is by a systematic selection, (iii) The m sampling units in the city will be selected by random numbers. For simplicity, assume that all blocks in the city contain an equal number, N̄, of segments. Suppose that there are
M blocks in the whole city,
N̄ segments in a block,
N = MN̄ segments in the whole city,
n̄ segments in a sampling unit,
N̄/n̄ sampling units in a block,
MN̄/n̄ or N/n̄ sampling units in the whole city,
m sampling units in the sample.
Then if
one may take
for an estimator of the x-population in the whole city. For this estimator,
and
If m is small compared with M,
If, also, n̄ is small compared with N̄,
Here is the variance between blocks of the mean x-population per sampling unit, and is the average variance between sampling units within blocks.
Important principle in size of secondary unit
Suppose that the cost of adding one more block to the sample is c, (cost of maps, preparation, delineation of segments, travel) and that the cost of an interview in an additional sampling unit is c2. Then the total cost of the survey will be
In eq. (37) var x̄ will be at its minimum for a fixed cost K if
This equation was derived by both L. H. C. Tippett and Shewhart, independently, in 1931.
Note that m does not appear in this equation. That is, the optimum value of n̄ on the basis of the cost function (40) is independent of m, the number of sampling units in the sample (and very nearly independent of the number of blocks in the sample).
The optimum m is found by substituting the optimum n̄ from eq. (41) into eq. (40) and solving for m Of course, it is necessary to assume values for σw/σb and for to do this. (Because each sampling unit will usually fall in a different block, m will usually be exactly or nearly as large as the number of blocks in the sample.) Usual numerical values of σw : σb and of c1 : c2 lead to small values of n̄ and to large values of m. Efficient design therefore usually requires a small sample from a block and dispersion of the sample into a large number of blocks.
Extension of this theory to a national sample, and to stratified designs and ratio estimators, leads to the same principle.
Variation in size of segment will increase var x by the factor 1 + where is the rel-variance of the distribution of the number of dwelling units per segment. A similar factor, measures the increase in var x̄ from variation in the number of segments per block.
Replicated designs for ease in estimation of variance
Replication of a sample in two or more interpenetrating networks of samples will provide a basis for rapid calculation of a valid estimate of the standard error of any result, regardless of the complexity of the procedure of selection and of the formulas for the formation of estimates [Mahala-nobis 1944; Deming 1950; I960; see also Index Numbers, article on Sampling].
Stratified sampling
The primary aim of stratified sampling is to increase the amount of information per unit of cost. A further aim may be to obtain adequate information about certain strata of special interest.
One way to carry out stratification is to rearrange the sampling units in the frame so as to separate them into classes, or strata, and then to draw sampling units from each class. The goal should be to make each stratum as homogeneous as possible, within limitations of time and cost. Stratification is equivalent to blocking in the design of an experiment. It is often a good plan (i) to draw a preliminary sample from the frame without stratification; (ii) to classify into strata the units in the preliminary sample, and (iii) to draw, for the final sample, a prescribed number of sampling units from each stratum so formed. Step (i) will sometimes require an inexpensive investigation or test of every sampling unit in the preliminary sample to determine which stratum it belongs to.
Stratification is one way to make use of existing information concerning the frame other than the information obtained from investigating the sampling units in the final sample itself. Other ways to use existing information are through ratio estimators and regression estimators (see below).
In practice a frame is to some extent naturally
Table 2 — Notation and definitions for the frame (M = 2 strata) | |||||||
---|---|---|---|---|---|---|---|
NUMBER OF SAMPLING UNITS | STRATUM’S PROPORTION OF SAMPLING UNITS IN THE FRAME | POPULATION | BETWEEN THE POPULATIONS OF THE SAMPLING UNITS WITHIN THE STRATUM | ||||
STRATUM | In the frame | In the sample | Average per sampling unit in the stratum | Total in the stratum | Standard deviation | Variance | |
Source: Deming 1960, p. 286. | |||||||
1 | N1 | n1 | a1 | A1 = N1α1 | σ1 | ||
2 | N2 | n2 | a2 | A2 = N2a2 | σ2 | ||
Total for the frame | N | n | 1 | — | A | — | — |
Unweighted average per stratum | — | — | — | ||||
Weighted average per sampling unit | — | — | — | — | σ̄w |
stratified to begin with. Thus, areas in geographic order usually are already pretty well stratified in respect to income, occupation, density of population, tastes of the consumer, and other characteristics. No frame arrives thoroughly mixed, and any plan of sampling should be applied by zones, so as to capture the natural stratification. Theory serves as a guide to determine whether further stratification would be profitable.
Plans of stratification for enumerative studies
Several plans of stratified sampling for enumerative studies will now be described.
The notation and definitions to be used in this discussion are given in tables 2 and 3. (Note that N̄ and n̄ are defined differently here than they were earlier.) These tables are presented in terms of two strata (M = 2), but extension to a greater number of strata follows obviously. The following additional definitions are needed:
the average reverse variance between sampling units within strata, and
the average reverse standard deviation between sampling units within strata, where Q4 + Pi = 1.
Plan A (no stratification): The scheme of sampling described above will be designated plan A. It is needed here for comparison, and also because it constitutes the basis for selection from any stratum.
Note that in plan A, as in plans B, D, F, and H, below, all the sampling units in the frame have equal probability of selection, namely n/N, wherefore Ex = a and EX = A.
Pi known—whole frame classified. Two sampling plans for which the proportions in each stratum are known (or ascertain able) and the the whole frame is classified will now be described.
Plan B (proportionate sampling): Decide with the help of eq. (47) the size, n, of the sample required. Compute next
Draw by random numbers, as in plan A, a sample of size rii from stratum i. Investigate every member of the sample, and calculate
Table 3 — Notation and definitions for the sample | ||||
---|---|---|---|---|
Stratum 1 | Population in the sample | Mean population per sampling unit | Estimated total population | Variance of this estimator |
*The variances are additive only if the N, (or Pf) are known and used in the estimator X. Source: Deming 1960, p. 287. | ||||
Source: Deming 1960, p. 287. | ||||
1 | varX1 | |||
2 | varX2 | |||
Sum | X | — | X | varX |
(For simplicity, most formulas will henceforth be written for two strata, in conformance with tables 2 and 3. Extension to more strata is obvious.) Here, ni/Ni = n/N, wherefore
and
The ni of eq. (44) and later expressions will not in general be integers. In practice one uses the closest integer; the effects on variance formulas are usually completely negligible.
Plan C (Neyman sampling): Decide with the help of eq. (49) the size, n, of the sample required. Compute next the Neyman allocation (Neyman 1934),
Draw by random numbers, as in plan A, a sample of size “j from stratum i. Investigate every member of the sample. Form estimators X1, X,, and X = X1 + X2. Form x̄ = X/N for an unbiased estimator of a. Here
nN
The Neyman allocations are the optimal n; for minimizing var x̄ when the P; are known.
Piknown—only a sample classified. One may, in appropriate circumstances, require only the classification of a preliminary sample drawn from the frame. The decision hinges on the costs of classification and the expected variances of the plans under consideration.
Plan D: Decide with the help of eq. (50) the size, n, of the sample required. Draw the sample as in plan A. Classify the sampling units into strata. The number, n,, of sampling units drawn from stratum i will be a random variable. Carry out the investigation of every unit of the sample. Form X,, X,, X, and x as in plan B. Then
Plan E: Decide with the help of eq. (52) the size, n, of the final sample. Draw by random numbers a preliminary sample of size n’. Thin (reduce) by random numbers the strata of the preliminary sample to reach the Neyman ratios
and simultaneously the total sample, n. Here , , etc., are the sizes of the preliminary sample in the several strata, and n,, n2, etc., are the sizes of the final sample. For greatest economy, choose n’ so that one stratum will require no thinning. Carry out the investigation of every unit of the final sample. Form the estimators X1, X2, and X = Xt + X2. Then x = X/N will again be an unbiased estimator of a, but now
the latter form useful if N is large relative to n’.
Sequential classification of units into strata
We now describe two plans in which the sample-sizes, n{, are reached sequentially, with considerable saving under appropriate conditions.
Plan F: Determine the desired sample-sizes, n4, as in plan B. Draw by random numbers one unit at a time from the frame, and classify it into its proper stratum. Continue until the quotas, nip are all filled. Form X as in plan B; var x wall be the same as for plan B.
Plan G: This is the same as plan F except that the sample sizes, nt, are fixed as in plan C. Form X as in plan C; varx will be the same as for plan C.
Pi not known in advance. When the proportions, Pi, in the frame are unknown, estimates thereof must come from a sample, usually a preliminary sample of size N’ > n, where n is the size of the final sample.
Plan H: Decide with the help of eq. (55) the size, n, for the final sample. Compute the optimum size, N’, of the preliminary sample by the formula
where Ci is the average cost of classifying a sampling unit in the preliminary sample, and c2 is the average cost of the final investigation of one sampling unit.
The procedure is to draw as in plan A a preliminary sample of size N’ and to classify it into strata. Treat the preliminary sample as a frame of size N’. Then thin all strata of the preliminary sample proportionately to reach the final total size, n.
Carry out the investigation of every sampling unit in the final sample. An unbiased estimator of a is
where x is the total x-population in the sample. Then
plan H,
is an excellent approximation if N be large relative to N’.
Plan I: Decide with the help of eq. (59) the size, n, for the final sample. Compute the optimum size, N’, of the preliminary sample, using the equation (Neyman 1938)
Draw as in plan A a preliminary sample of size N’. Classify it as in plan H. Thin the strata differentially to satisfy the Neyman ratios
and to reach the desired final total sample-size, n. Carry out the investigation of every sampling unit in the final sample. An unbiased estimator of a is
for which (59)
is an excellent approximation if N be large relative to N’ and to n.
One may use plan F or plan G in combination with plan H or plan I to reap the benefit of many strata without actually classifying the entire preliminary sample, N’ (Roller 1960).
Gains of stratified sampling
Gains of stratified sampling can be evaluated by comparing variances. Denote by A, B, and C the variances of the estimators of a calculated by the plans A, B, and C. Then
For example, if Pt = .6, P2 = .4, and would be (1 - .8 )/l = .2, meaning that 100 interviews selected according to plan B would give rise to the same variance as 125 selected according to plan A.
The gains of plans F and G over plan A are the same as the gains of plans B and C over plan A. The average gains in repeated trials of plans D and E are less. If and are large, plans D and E will usually not be good choices. For large samples, however, in circumstances where and are not large, the gains of plans D and E may be almost equal to the gains of plans B and C, at considerably less cost.
Eqs. (60) and (61) show that the gain to be expected from the proposed formation of a new stratum, i, will not be impressive unless its proportion, P,, be appreciable, or unless its cr* or its a> be widely divergent from the average.
Stratification to estimate over-all ratio
The case to be used for illustrating stratified sampling to estimate an over-all ratio consists of three strata: stratum 1 for large units (for example, high incomes or large farms), stratum 2 for medium-sized units, and stratum 3 for small units. Here stratum 1 is to be covered 100 per cent; obvious modifications take care of the case in which stratum 1 is not sampled completely.
First take as an estimator of Φ
in the notation of tables 1 and 2, with B4 as the value of the y-characteristic in stratum i of the frame. Optimum allocation to strata 2 and 3 is very nearly reached if both
and
wherein s, and s3 are the standard deviations of the ratio of x to y in strata 2 and 3.
If, as is often the case, s, and s3 do not differ much, or if little is known about them in advance, one can still make an important gain in efficiency by setting n2-.n3 = B2: B3 or n,:n3 = A2:A3.
Another estimator of the ratio Φ is
wherein P{= Bi/B and fi = x̄i/ȳi. This estimator is sometimes preferred when fi varies greatly from stratum to stratum, and when there can be no trouble with small denominators. The allocation of sample for this estimator is, for practical purposes, the same as in eq. (63) (Cochran [1953] 1963, p. 175; Hansen et al. 1953, vol. 1, p. 209).
Sequential adjustment of size of sample
It is sometimes possible, when decision on the size of sample is difficult, or when time is short, to break the sample in advance into two portions, 1 and 2, each being a valid sample of the whole frame. Portion 1 is definitely to be carried through to completion, but portion 2 will be used only if required. This may be called a two-stage sequential method. It is practicable where the investigation is to be carried out by a small number of experts that will stay on the job as long as necessary but not where a field force must be engaged in advance for a definite period.
Modifications for differing costs
If investigating a sampling unit in a particular stratum is three or more times as costly as the average investigation, it may be wise to decrease the sample in the costly stratum and to build up the sample in other strata (Deming 1960, p. 303).
Considerations for planning
In order to plan a stratified sample, certain assumptions are necessary. Fair approximations to the relevant ratios, such as σw:σ,,σ̄w:σb, will provide excellent allocation. On the other hand, bad approximations to these ratios, or failure to use theory at all, can lead to serious losses.
The required good approximations to these ratios may come from prior experience, or from probing the knowledge of experts in the subject matter. For example, the distribution of intelligence quotients in the stratum between 90 and 110, if rectangular, would provide σ2 = (110 - 90)2/12, or 33, whence σ 5.7. Other shapes have other variances, but shape is fortunately not critical (Deming 1950, p. 262; 1960, p. 260). A stratum with very high values should be set off for special treatment and possibly sampled 100 per cent.
Stratification for analytic studies
As mentioned earlier, the aim in an analytic study is to detect differences between classes or to measure the effects of different treatments.
The general formula for the variance of the difference between two means, x̄A and x̄B., derived from independent samples of sizes nA and nB drawn by random numbers singly and without stratification from, for example, two groups of patients, A and B, is
wherein and are the respective variances between the patients within the two groups.
For such analytic studies the optimum allocation of skill and effort is found by setting
wherein cA and cB are the costs per case. Note that the sizes of the groups do not enter into this formula and that it is different from the optimum allocation in enumerative problems.
In many analytic studies o-A and crB will be about equal, and so will the costs cA and cB. In such circumstances, the best allocation is
Regression estimators
We have already seen reduction in variance resulting from use of prior or supplementary knowledge concerning the frame. Use of prior knowledge of N to form the estimator X = Nx̄ is an instance. Prior knowledge of B to form the ratio estimator is another instance. This section, on regression estimators, describes other ways to use prior or supplementary knowledge concerning the frame. Regression estimators include the simple estimator, x, and the ratio estimator, fb, as special cases, but they also include many other estimators, some of them highly useful. Like the ratio estimator of a total, these additional estimators are applicable only if independent and fairly reliable information is available about the ^-population in the frame. Any estimator that takes advantage of supplementary information may have considerable advantage over the simple estimator, x, if the correlation, p, between Xi and yt is high, but this condition is not in itself sufficient.
Specific forms of regression estimators
Assume simple random selection and write the regression estimator in the form
wherein b is known independently from some source such as a census. The subscript i on xt here differentiates the several specific forms of regression estimators.
Regression estimators are closely allied with the analysis of covariance [see Linear Hypotheses, article on Analysis Of Variance]. The four cases to be considered here are taken largely from Hansen, Hurwitz, and Madow (1953).
Simple estimator
If mh is taken as zero, the regression estimator obtained is xx = x, seen earlier. This procedure makes no use of supplemental information. Under the assumption that N is large relative to n, the variance of this estimator is
likewise,
Difference estimator
The estimator x̄2, often called the difference estimator, is practicable if
Table 4 — Rel-variances when estimator of b is subject to sampling error | ||
---|---|---|
Estimator | Case I: sample of size n is drawn as a subsampfe of n’ | Case II: samples of size n’ and n’ are independent |
x̄1 | Same as in Case 1 | |
x̄2 | ||
x̄3 | Same as in Case 1 | |
x̄4 |
prior knowledge (such as prior surveys of a related type) provides a rough approximation to the regression coefficient y3 = pos/a-}. This estimator is
where m2 is any approximate slope not derived from the sample under consideration. The variance of x2 is
varx2 = 0-2(1 -p2) +cr?(m2-j8)2 = cr2(l-p2 + p2e2),
where /3 = po-z/os and e - (m2 — /})//?. (Note that pe = (cr5/crf)(m2 — /3) even if p = 0.)
Least squares regression estimator. If Tttf is chosen as
then the equation
gives the so-called least squares regression estimator. The variance of this estimator is
Here R is a remainder in the Taylor series involving 1/n2 and higher powers; this remainder will be negligible if n is large.
Ratio estimator. If ms is chosen as
the ratio estimator is
The variance of x4 is
R’ being another remainder. For large n and for
It follows that for large n and for m2 = j3 and
w = Cj,
2
varx4
var x2 1 + p
Comparison of regression estimators. If the correlation, p, between xt and yt is moderate or high, but the line of regression of x on y misses the origin by a wide margin, then the estimator x̄3 will show substantial advantages over x̄4 and Xj. If the y- variate shows relatively wide spread (that is, if C$ is much greater than C*), the ratio estimator x4 may be far less precise than the simple estimator x̄l = x̄, even when p is high, especially if the line of regression misses the origin by a wide margin. On the other hand, if the line of regression passes through the origin (pCx̄ = Cθ,̄), or nearly through it, x3 and x4 will have about the same variance, but x4 may be much easier to compute.
Estimator of b subject to sampling error. It often happens that the ^-population per sampling unit is not known with the reliability of a census but comes instead from another and bigger sample. This circumstance introduces additional terms into the variances. Let n be the size of the present sample and n’ the size of the sample that provides the estimate of b. We suppose that the variance of this estimate of b is The resulting variances of the regression estimators are shown in Table 4.
W. Edwards Deming
BIBLIOGRAPHY
Bowley, Arthur L. (1901) 1937 Elements of Statistics. 6th ed. New York: Scribner; London: King.
Chevry, Gabriel 1949 Control of a General Census by Means of an Area Sampling Method. Journal of the American Statistical Association 44:373—379.
Cochran, William G. 1946 Relative Accuracy of Systematic and Stratified Random Samples for a Certain Class of Populations. Annals of Mathematical Statistics 17:164-177.
Cochran, William G. (1953) 1963 Sampling Techniques. 2d ed. New York: Wiley.
Dalenius, Tore 1962 Recent Advances in Sample Survey Theory and Methods. Annals of Mathematical Statistics 33:325-349.
Deming, W. Edwards (1943) 1964 Statistical Adjustment of Data. New York: Dover.
Deming, W. Edwards 1950 Some Theory of Sampling. New York: Wiley.
Deming, W. Edwards 1960 Sample Design in Business Research. New York: Wiley.
Deming, W. Edwards; and Keyfitz, Nathan 1965 Theory of Surveys to Estimate Total Population. Volume 3, pages 141-144 in World Population Conference, Belgrade, August 30-September 10, 1965, Proceedings. New York: United Nations.
Fisher, R. A. (1922) 1950 On the Mathematical Foundations of Theoretical Statistics. Pages 10.307a-10.368 in R. A. Fisher, Contributions to Mathematical Statistics. New York: Wiley. → First published in Volume 222 of the Philosophical Transactions, Series A, of the Royal Society of London.
Fisheh, R. A. (1956)1959 Statistical Methods and Scientific Inference. 2d ed., rev. New York: Hafner.
Gauss, Carl Friedrich 1823 Theoria combinationis ob-servationum erroribus minimis obnoxiae. Gottingen (Germany): Dieterich. → A French translation was published in Gauss’ Mfthode des moindres cane’s (1855). An English translation of the French was prepared as Gauss’s Work (1803–1826) on the Theory of Least Squares, by Hale F. Trotter; Statistical Techniques Research Group, Technical Report, No. 5, Princeton Univ., 1957.
Hansen, Morris H.; and Hurwitz, William N. 1943 On the Theory of Sampling From Finite Populations. Annals of Mathematical Statistics 14:333-362.
Hansen, Morris H.; and Hurwitz, William N. 1946 The Problem of Non-response in Sample-surveys. Journal of the American Statistical Association 41: 517-529.
Hansen, Morris H.; Hurwitz, William N.; and Madow, William G. 1953 Sample Survey Methods and Theory. 2 vols. New York: Wiley.
Kish, Leslie 1965 Survey Sampling. New York: Wiley. → A list of errata is available from the author.
Roller, Siegfried 1960 Aussenhandelsstatistik: Unter-suchungen zur Anwendung des Stichprobenverfahrens. Pages 361-370 in Germany (Federal Republic), Statis-tisches Bundesamt, Stichproben in der amtlichen Statistik. Stuttgart (Germany): Kohlhammer.
Leven, Maurice 1932 The Income of Physicians: An Economic and Statistical Analysis. Univ. of Chicago Press.
Mahalanobis, P. C. 1944 On Large-scale Sample Surveys. Royal Society of London, Philosophical Transactions Series B 231:329-451.
Mahalanobis, P. C. 1946 Recent Experiments in Statistical Sampling in the Indian Statistical Institute. Journal of the Royal Statistical Society Series A 109: 326-378. -” Contains eight pages of discussion.
Moser, C. A. 1949 The Use of Sampling in Great Britain. Journal of the American Statistical Association 44:231-259.
Neyman, Jerzy 1934 On the Two Different Aspects of the Representative Method: The Method of Stratified Sampling and the Method of Purposive Selection. Journal of the Royal Statistical Society Series A 97:558-606.
Neyman, Jerzy 1938 Contribution to the Theory of Sampling Human Populations. Journal of the American Statistical Association 33:101-116. → See especially equation 49 on page 110.
Quenouille, M. H. 1959 Rapid Statistical Calculations. London: Griffin; New York: Hafner. -> Pages 5-7 show estimators of the standard deviation by use of the range.
Satterthwaite, F. E. 1946 An Approximate Distribution of Estimates of Variance Components. Biometrics 2:110-114.
Shewhart, Walter A. 1939 Statistical Method From the Viewpoint of Quality Control. Washington: U.S. Department of Agriculture, Graduate School.
Stephan, Frederick F. 1936 Practical Problems of Sampling Procedures. American Sociological Review 1:569-580.
Stephan, Frederick F. 1948 History of the Uses of Modern Sampling Procedures. Journal of the American Statistical Association 43:12-39.
Stephan, Frederick F.; and Mccarthy, Philip J. 1958 Sampling Opinions: An Analysis of Survey Procedure. New York: Wiley.
Stuart, Alan 1962 Basic Ideas of Scientific Sampling. London: Griffin; New York: Hafner.
Symposium On Contributions Of Genetics To Epidemio-logic Studies Of Chronic Diseases, Ann Arbor, Michigan, 3963 1965 Genetics and the Epidemiology of Chronic Diseases. U.S. Public Health Service, Publication No. 1163. Washington: Government Printing Office. -” See especially “Selection techniques for rare traits,” by Leslie Kish, pages 165-176.
Yates, Frank (1949)1960 Sampling Methods for Censuses and Surveys. 3d ed., rev. & enl. New York: Hafner. → Earlier editions were also published by Griffin.
II NONPROBABILITY SAMPLING
Nonprobability sampling refers to the selection of sampling units for a statistical study according to some criterion other than a probability mechanism. In contrast to probability sampling, non-probability sampling has two major disadvantages: (1) the possibility of bias in the selection of sampling units and (2) the impossibility of calculating sampling error from the sample.
The attitude toward sampling held by most social scientists has come full circle: whereas it was once necessary to argue strenuously the case for deliberately using a probability mechanism in the selection of material for study, that case is now so widely accepted that any nonprobabilistic sampling procedure is regarded with suspicion. It will be argued here that such suspicion is generally well founded but that it is not to be equated with condemnation, since there are occasions when nonprobability sampling is the only procedure open to the investigator.
The issues involved in nonprobability sampling may be sharply focused by considering first the status of the inferences made by an archeologist. He can usually expect to find only a fraction of the material that he would like in order to examine the causes and motives of past events, and that fraction will generally be small for remote periods. Every object that he would judge relevant is a potential witness in his court of inquiry, but many of them have long since been destroyed or lost. Yet there is no contradicting the fact that some strongly convincing arguments can be made, even about the remote past. Does it follow, then, that probability sampling is inessential to any social scientist’s inferences about the present?
As soon as it is drawn, this analogy is seen to be false. The archeologist usually has no choice;
he must, in the court metaphor, accept what witnesses offer themselves. The situation of another social scientist is different: he is rather in the position of having too many potential witnesses, among whom he must choose on grounds of cost and time. How is he to choose? He cannot fairly exclude those potential witnesses whose evidence may not be to his taste, for if he did, he would implicitly abandon his judicial position and become merely an advocate. (The fact that some have apparently undergone this transformation is not good reason for others to follow them.) If some sources of evidence must be chosen in preference to others, they can only conscientiously be chosen in some way unrelated to the nature of their evidence: they must be selected by a probability mechanism.
The value of probability sampling is, therefore, not that it ensures employment for statisticians, but that it provides a guarantee of freedom from selection bias on the part of the investigator, bias that may be strong even if quite unconsciously produced (Yates [1949] 1960, chapter 2). The absence of selection bias is the first advantage of probability sampling.
Nonprobability sampling always has the characteristic that the sampling procedure is ill-defined. One literally cannot know what chance of selection any individual in the population has had. It is this lack of precise definition that leads directly to the second disadvantage of nonprobability sampling: there is no way of estimating the sampling error of a nonprobability sample from the sample itself, while such estimation is typically possible when probability sampling is used.
Despite what has just been said, nonprobability samples will always be important in the social sciences. Confronted with a remote and hostile tribe, the anthropologist will not be inclined to imperil his foothold of hospitality and cooperation by selecting informants at random. In advanced societies, selected individuals or institutions may refuse to cooperate in an inquiry and usually cannot be compelled to do so. Even in such circumstances, however, at least an attempt at a probability sample should always be the investigator’s object. He may fail to achieve it but cannot be worse off than if he had not tried at all. The unshakable optimism of all candidates for political office must be partly attributed to reports from supporters who have largely been sampling the faithful.
Nonprobability samples can arise in several ways: through the investigator’s ignoring or denying the force of the above considerations; through his recognizing their force but claiming that the objectives of probability sampling can just as well be achieved by substitute methods; or finally, through his lack of success in attempts to achieve probability samples. About the first category nothing more will be said here, but a more detailed examination of the other two will be made.
Representative methods
Most of the substitutes suggested for probability sampling have in common some attempt to make the sample “representative” of the population from which it was selected, for example, by arranging that the proportions of men and women be the same in both or (more rarely) that the average income be the same in both. In practice, the most frequent substitute is quota sampling (Moser 1952), in which distributions by sex, age, “class,” and sometimes additional characteristics are equalized between sample and population and the interviewers are otherwise free to select cases. Such representative methods are to be sharply distinguished from stratified probability sampling, where the population is divided into groups prior to sampling and each group is sampled separately by probability methods.
There are several distinct criticisms to be made of representative procedures. In the first place, the agreement of even quite a large number of averages or percentages between sample and population in no way guarantees high accuracy in other respects, and very large biases are still possible (Neyman 1934). Second, the lack of definition in the sampling procedure is not corrected by the imposition of such agreements; sampling error is still not calculable.
The most important criticism, however, is a paradoxical one. There is no such thing as a “representative,” “unbiased,” “fair,” or otherwise “acceptable” sample: such adjectives are strictly applicable to the sampling process that produces the sample, and their application to the sample itself is at best a piece of verbal shorthand and at worst a sign of muddled thinking. A sample can be judged only in relation to the process that produced it. The central concepts of selection bias and sampling error have no meaning except in this context (Stuart 1962). Thus, an ill-defined sampling process, such as is involved in nonprobability samples, can only produce a sample with ill-defined properties of bias and sampling error.
Embedding in a probability framework
The statement that there is no way of calculating sampling error for nonprobability samples is always true but can be made irrelevant by the device of embedding a nonprobability sampling procedure within a higher-order probability framework. Suppose that a nonprobability sampling procedure is proposed in which 1,000 families are to be interviewed in a certain area and that ten interviewers are to be used. If each interviewer is independently given an assignment to interview 100 families by identical methods, there will be not one, but ten nonprobability samples, and the variation among the results of these ten samples may be used to estimate the sampling error attached to a sample of 100 families and thence, by a natural extension, to estimate the error for a sample of 1,000 families. Since the sample is designed as ten equivalent independent subsamples, in effect the sampling is of ten members from the population of all possible samples obtainable by that method with those interviewers.
In practice, it may be difficult or practically impossible to achieve independent subsamples in some contexts. For example, in the construction of index numbers the choice of commodity items is usually made by a panel of experts, so that the items chosen constitute a nonprobability sample. If an attempt is made to measure the variation between the judgments of several experts at the same level of expertise, interaction phenomena that make the measurement difficult are immediately encountered. For example, the experts are likely to have consulted one another; they are likely to know the criteria of selection used by their peers; they are bound to meet in the course of examining files containing background information, and so on. Thus, although independent judgments are in principle possible, the practical problem of arranging for independence is very difficult. Now, almost certainly, in the simple interviewing example, differences between the interviewers will affect the results of the survey (through personal characteristics, differences in thoroughness, etc.), and such differences will inflate the sampling error when it is estimated as above. If it were important to eliminate differences between interviewers from the sampling error, a slight modification in sampling design would suffice. Each interviewer must be asked to carry out at least two independent sample assignments, say four assignments of 25 families each. It will then be possible, by exactly the same argument as before, to estimate variability “within interviewers” only, thus excluding variability “among interviewers” from sampling error estimates. Further improvements in design are also possible. [See Index numbers, article onsampling.]
It will be seen that nonprobability sampling treated in this way utilizes the theory of experiment design, and, as always in that subject, care is necessary to ensure that assignments are allocated at random to interviewers. An incidental advantage of such experiment designs, which is often of greater practical importance than the original purpose of estimating sampling error, is that they make it possible to see whether interviewers are varying so much in performance that further training or other action is called for.
Such evidence as is available from designs of this sort indicates (Moser & Stuart 1953; Stephan & McCarthy 1958) that the sampling errors of quota sampling are considerably larger than those of comparable probability sampling. (If sampling is multistage, the inflation will of course only apply to the stage or stages at which quota sampling is used—this is commonly the final stage only.) Quota sampling can therefore only be justified, if at all, on grounds of reduction of costs, but it is doubtful whether, in the light of current cost structures, this factor is large enough to offset the excess sampling error of quota sampling. Some crude numerical guide is contained in the introduction to the tables by Stuart (1963).
Of course, experiment designs of this sort may be used with any ordinary probability sampling scheme (Mahalanobis 1946). The point here is that experiment designs rescue nonprobability sampling schemes from one of their worst deficiencies, the impossibility of calculating sampling error estimates, which does not hold for probability sampling. Nothing, however, can rescue nonprobability sampling from the ever-present danger of selection biases, a danger particularly acute for those forms (like quota sampling) that allow considerable freedom to interviewers to select cases.
Taken in conjunction with the sampling error and cost considerations, the bias danger suggests that, whatever may have been the case when it originated, quota sampling is now, in terms of reliable information delivered per unit cost, uneconomical as well as anachronistic.
Incompletely achieved probability samples
It has been noted that the major unremovable drawback of nonprobability samples is the danger of bias in the selection procedure. Inevitably, this danger must also arise when an intended probability sample is, through lack of cooperation, through inaccurate fieldwork, or through accidentally missing records, reduced to a fraction of the number of cases originally selected. The selected sample is, for one or another of these reasons, incompletely achieved, and the sampler’s duty is to survey the damage done to his original plan and try to assess the importance of the danger to his purposes. Strictly speaking, an incompletely achieved probability sample ceases to be a probability sample, although it usually continues to be called one. Intentions are not the same as achievement, and the sampler must see whether the courtesy title is deserved.
Before discussing what can be done in such situations, it is useful to compare the reaction of the probability sampler confronted by an incompletely achieved sample with that of the non-probability sampler in the same position. It is an empirical fact that the probability sampler is usually much more worried about the incomplete sample than is the nonprobability sampler, because the stringency of a probability sampling scheme draws much greater attention to any weaknesses in the achieved sample. By contrast, the nonprobability sampler usually finds it easier to bring his sample numbers up to the required level. His sampling scheme is less stringent and can more easily be patched up. Advocates of non-probability sampling (principally quota sampling) have been known to categorize this as an advantage for their method, but we shall take the contrary view. It is because nonprobability sampling is so ill-defined that its definition can be extended to cover incomplete sample fulfillment. In the kingdom of the one-eyed, the blind man is less easily seen. Perhaps the least appreciated, because it is the most troublesome, feature of probability sampling is the urge for sample completeness that it imposes upon its practitioners.
Faced with a seriously incomplete sample, the probability sampler will usually redouble his efforts, but however hard he tries, he will ultimately have to face the fact that some part of his sample is not to be completed. The characteristic rates of noncompletion vary greatly with types of sample. For probability samples of individuals in the United Kingdom, it is possible to achieve completion rates of 90 per cent, although 80 per cent is nearer the general average. For probability samples of households in United Kingdom government household-budget inquiries, the completion rates range around 70 per cent. Few surveys involve a more onerous burden upon the respondent than the keeping of a detailed household budget, so it is probably reasonable to state that completion rates in the United Kingdom should never, with careful fieldwork, fall below about 70 per cent. The range of noncompletion rates likely to be encountered is thus of the order of 10 to 30 per cent. For many purposes, the sample results will not be seriously biased by even a noncompletion rate of one-quarter. For example, if 60 per cent of all households in the completed three-quarters of the sample display a certain characteristic, the non-completed quarter of the sample would need to have under 40 per cent or over 80 per cent with that characteristic for the bias in the completed sample to exceed 5 per cent.
The crucial question to be examined in assessing the likely magnitude of bias due to incompleteness is whether the causes of incompleteness are related to the questions of interest. In a study of women’s cosmetics usage, it would be running a serious risk to ignore a very low completion rate for teen-agers, who are particularly heavy users of cosmetics; if the completed sample showed a very low percentage of teen-agers compared with the selected sample (or known recent population data), special efforts would have to be made to increase that percentage. But if the completed sample showed an excess of women aged 45-55 and a shortage of women over 65, this would be unlikely to exert a significant influence on cosmetics-usage findings. However, if the subject studied were demand for domestic help, it would be the shortage of the elderly that would be the threat. Data from published population figures or from specially undertaken supplementary probability samples can often be used to check and, with appropriate methods of statistical analysis, to improve estimates suspected of bias, whether it is due to sample incompleteness or not (Moser & Stuart 1953; United Nations 1960, sec. 13). The simple examples just given show that such check data can only be used in conjunction with the experienced judgment of the investigator. If extreme imbalances of the completed sample are revealed, they are just cause for suspicion of the sample results, but the converse does not hold: no finite amount of external checking of this kind can ever fully validate a nonprobability sample, since the crucial variable to check may be overlooked or not checkable. There is no completely satisfactory substitute for a fully achieved probability sample. Any shortfall from this status is a potential threat to the inferences drawn from the sample. The investigator may be able to judge in some cases that the threat is not likely to be serious, but he can do no more than this. Insofar as a probability sample is fully achieved, it obviates the need to make such judgments.
To return to the first analogy, it can now be seen that the archeologist is in a situation closely resembling that of the investigator with an incomplete nonprobability sample. Another social scientist can usually improve on this situation at least to the extent of starting with a probability sample of his material. The incompleteness of his sample may on occasion compel him to make judgments of an inconclusive kind about the quality of his sample, but this is not a valid reason for extending the scope of the inconclusive judgment to the whole sampling procedure.
Alan Stuart
[For methods of accomplishing randomization, seeExperimental design, article onthe design of experiments; Random numbers. Also related areErrors, article onnonsampling errors; Experimental design, article onquasi-experimental design; Index numbers, article onsampling.]
BIBLIOGRAPHY
Mahalanobis, P. C. 1946 Recent Experiments in Statistical Sampling in the Indian Statistical Institute. Journal of the Royal Statistical Society Series A 109: 326-378. → Contains eight pages of discussion.
Mosee, Claus A. 1952 Quota Sampling. Journal of the Royal Statistical Society Series A 115:411-423.
Mosek, Claus A. 1958 Survey Methods in Social Investigation. New York: Macmillan.
Moser, Claus A.; and Stuart, Alan 1953 An Experimental Study of Quota Sampling. Journal of the Royal Statistical Society Series A 116:349-405. → Contains 11 pages of discussion.
Neyman, Jerzy 1934 On the Two Different Aspects of the Representative Method: The Method of Stratified Sampling and the Method of Purposive Selection. Journal of the Royal Statistical Society Series A 97: 558-606.
Stephan, Frederick F.; and Mccarthy, Philip J. 1958 Sampling Opinions: An Analysis of Survey Procedure. New York: Wiley.
Stuart, Alan 1962 Basic Ideas of Scientific Sampling. London: Griffin; New York: Hafner.
Stuart, Alan 1963 Standard Errors for Percentages. Applied Statistics 12:87-101.
United Nations, Department of Economic and Social Affairs 1960 A Short Manual on Sampling. Volume 1: Elements of Sample Survey Theory. United Nations, Statistical Office, Studies in Methods, Series F, No. 9. New York: United Nations.
Yates, Frank (1949)1960 Sampling Methods for Censuses and Surveys. 3d ed., rev. & enl. London: Griffin; New York: Hafner.
Surveys, Sample
Surveys, Sample
Surveys are instruments that researchers use to measure attitudes, tastes, viewpoints, and/or facts from a specific population. Most populations (or groups) are too large and widespread geographically to allow researchers to obtain information on each group member. To compensate, researchers have devised several methods by which they make inferences about a population by using information gathered from a selected sample of the population.
Various government agencies (including the U.S. Census Bureau) and academic disciplines (including economics, political science, and sociology) make frequent use of survey sampling to better understand the prevailing characteristics of specific populations. This entry examines survey sampling through the two most common formats in which it is conducted: questionnaires and interviews. Before moving to this discussion, however, it is worthwhile to describe the central methodology that makes survey sampling of populations so effective—drawing the actual sample.
The most commonly used procedure in drawing a survey sample is the simple random sampling (SRS) method. In order to make effective inferences about a population, survey researchers must be confident that the sample they are using is representative of the population in which they are interested. If the sample is not representative then bias (the misrepresentation of a population’s characteristics) can result. SRS provides assurance that the sample represents the population because each sampling unit (a person) has an equal probability of being selected to participate in the survey.
Though an equal probability of random selection helps mitigate response bias in survey samples, most populations of interest to researchers have such significant variation that SRS alone does not provide enough confidence that the sample is truly representative of a population. In order to address population variation, national polling techniques, such as those practiced by Gallup, require an initial stratification of the population before the random sample is drawn. Stratification is the division of a population into homogenous groups according to a specific set of dimensions or stratums (e.g., geographic location, age, sex). Once the population is divided in this manner, SRS is applied to each stratum. In most cases, survey researchers draw a proportionate stratified sample, which helps to keep the sampling units from each stratum closely resembling their proportion in the overall population. Most surveys, whether delivered in questionnaire or interview format, make use of a stratified SRS methodology.
QUESTIONNAIRES
Questionnaires are impersonal surveys used to collect data from respondents that have been targeted as part of a sample. Questionnaire surveys have traditionally been delivered through the mail to sample respondents. The growing popularity of e-mail in the late 1990s allowed researchers to employ this method of delivery more frequently. Despite its growing popularity, however, most survey questionnaires do not use the e-mail method because of the relatively large number of people with limited or no e-mail access as of the early twenty-first century.
The primary advantage of questionnaires, regardless of delivery method, is that they are relatively low cost. In stark contrast to interviews, questionnaires do not require the assistance of trained staff. In addition, because it costs the same to mail a survey three miles or three thousand miles, there are financial advantages to conducting national surveys. Access to bulk mail rates can provide even greater savings for researchers. Another advantage to questionnaires is that they reduce bias errors that personal interviewers may introduce. Whenever one person is talking to another to obtain information, an interpersonal dynamic is introduced that can alter the way a respondent answers questions. Since questionnaires are delivered through paper or computer, this missing human element is a welcome absence. Finally, questionnaires provide greater anonymity, in large part because there is no interviewer aware of respondent identity.
There are also disadvantages to questionnaires. Primary among these is that survey questions must be fairly simple so as to be understood by the vast majority of intended respondents. If questions are too complex or vague, respondents may miss the point of a question entirely, thereby introducing response bias. Also problematic is the inability of researchers to probe respondents for more specific information on topics. Question answers are final. In addition, researchers have no control over who actually completes the questionnaire since they have no direct contact with the respondent. Finally, researchers face low response rates (20 percent to 40 percent) when using questionnaire-based surveys. Most published research using data collected from mail surveys reports a response rate between 20 to 30 percent, although the rate is sometimes higher for targeted populations. The Internet’s popularity has helped to increase response rates by allowing researchers to follow-up with respondents through a hybrid approach in which both mail and e-mail requests for questionnaire completion are transmitted to respondents.
INTERVIEWS
Personal interviews form the backbone of modern opinion polling. Usually conducted by a team of well-trained interviewers, these interviews enable polling companies to receive respondent data in a much shorter time frame than is required for questionnaires. Most interviews in opinion polling are of the schedule-structured variety, in which all respondents are asked the same questions, in the same order, in the same way so as to reduce response bias. Other interview forms include the focused (which allows the interviewer to ask probing questions depending upon how a respondent answers) and nondirective (in which the interviewer provides little structure or form). The focused and nondirective approaches are usually employed by academics focusing on a small sample of respondents in order to build empirical theories.
The primary advantage of the structured interview is that it gives researchers better control over the interview situation. The most direct improvement of interviews over questionnaires is that it is unlikely someone other than the respondent will provide question responses. Concomitantly, interviews have a much higher response rate than questionnaires (usually 95%), adding to their usefulness when time is of the essence. Of course, there are disadvantages, not the least of which is the interview bias referenced above. Cost is also a disincentive.
SEE ALSO Attitudes; Internet; Methods, Research (in Sociology); Polls, Opinion; Random Samples; Research; Research, Survey; Social Science; Survey; Tastes
BIBLIOGRAPHY
Frankfort-Nachmias, Chava, and David Nachmias. 2000. Research Methods in the Social Sciences. 6th ed. New York: Worth.
Richardson, Stephen, Barbara S. Dohrenwend, and David Klein. 1965. Interviewing: Its Forms and Functions. New York: Basic Books.
Schaffer, David R., and Don A. Dillman. 1998. Development of a Standard E-Mail Methodology. Public Opinion Quarterly 62: 378–397.
Weisberg, Herbert F., Jon A. Krosnick, and Bruce D. Bowen. 1996. An Introduction to Survey Research, Polling, and Data Analysis. 3rd ed. Thousand Oaks, CA: Sage.
Brian Calfano