Statistics as Legal Evidence

One of the normal functions of a legal trial is to resolve an uncertain factual situation according to some canon of probability: in criminal cases by evidence that establishes guilt “beyond reasonable doubt,” in civil cases by a “preponderance of the evidence.” In spite of these probabilistic terms, the law “refuses to honor its own formula when the evidence is coldly 'statistical.'” As a rule, “probabilities are determined in a most subjective and unscientific way” (Hart & McNaughton 1958, p. 54). A plaintiff who established that he was negligently run over by “a bus” and tried to identify the liable defendant by proof that Company X had the only regular bus franchise on the street where he was hit, was denied recovery on the ground that occasionally other buses traveled that street and that probability, however great, was not sufficient identification. In a trial for income tax evasion involving an illegal lottery wheel, expert testimony to the effect that mathematical probabilities suggest that the take from the wheel was twice as large as reported by the defendant was excluded as irrelevant. On the other hand, in an internal investigation of alleged rigging of a civil service examination, a chi-square test for goodness of fit showed that the distribution of the obtained grades, or a more extreme one, was highly improbable under the hypothesis of no cheating. This led to further investigation and eventual proof of chicanery, albeit not proof in court (McCann Associates 1966, p. 16).

However, in a recent Swedish trial for overtime parking, a figure was put to what constitutes insufficient probability for a finding of “guilty.” The police constable had marked the position of the valves on two tires on a standard sketch, accurate to the nearest “hour,” one valve at the “1 o'clock” position and the other at the “12 o'clock” position. Returning after a time lapse greater than the permitted length of parking, the constable found both valves in the same positions they had been in earlier. The defendant claimed he had left the place and returned. The Court of Appeals consid-dered odds of (12x 12=) 144:1 insufficient and declared registering the position of all four tire valves would have been sufficient—odds of (124 =) 20,736:1 (”Parkeringsfrägor ...” 1962, pp. 24, 25). The court may be pardoned for having overstated the odds by calculating them under the assumption that the positions of the valve markings are statistically independent of each other, which is almost certainly not true.

But this distrust of statistical evidence is directed primarily against statistical proof of individual, specific events. Whenever some measurement of a large universe is at issue, such as the share of a market or the proportion of people holding a certain view, statistical evidence is clearly the best, if not the only, evidence obtainable. It is in such contexts that statistical evidence is playing a growing role in litigation.

Objections to statistical evidence . Objections to statistical evidence come from two sources: the hesitation to accept sample results in place of complete census counts and—at least in the Anglo-Saxon legal sphere—the evidentiary rule prohibiting hearsay evidence, if the statistic is based on surveys that involve interviews. Of the two, the objection to sampling is less stubborn. The example set by the U.S. Bureau of the Census helped to break the way. Statutes in many states make census and other published governmental statistics prima facie legal evidence, although many “census” data are based on samples and all of them are hearsay evidence many times removed.

The hearsay rule is the more serious obstacle. The law holds that testimony must be open to cross-examination in order to test accuracy of perception, reliability of memory, and sincerity. The law allows exceptions to this rule, thereby inviting certain kinds of surveys (see below), but has been slow to admit statistical evidence in general. Oldfashioned doctrine will often allow testimony of selected witnesses who are far from constituting a representative sample but will refuse admittance of a survey based on a representative sample because, technically, it is hearsay evidence. That a carefully conducted survey is a source of truth superior to the testimony of such selected witnesses has been convincingly argued, and on the whole the courts are learning to appreciate this position.

Areas of acceptance . Statistical evidence has found more general acceptance in three areas: in proceedings before administrative agencies which are not bound by evidentiary rules; in antitrust cases, where measurements of market shares have become almost indispensable; and in surveys of what the law calls the witness' “state of mind”— a broad area that is specifically exempt from the hearsay rule and that has spawned much statistical evidence. If a respondent is asked, for instance, whether he believes two different trademarks represent the same or different manufacturers, the survey maker knows what the true facts are; all he wants to find out is whether the interviewee knows them too. Following are the major types of disputes in which survey evidence often forms what is usually the core of proof.

Consumer awareness. Sometimes the law provides that a trademark or advertising slogan can be protected only as long as it is in sufficient use, that is, sufficiently established in the consumer's mind (Verkehrsgeltung in German trademark law).

Confusion of trademarks. A trademark that is so similar to an already existing one that the two are likely to be confused may not be registered. The similarity might be created by similar words, by similar design, by a similar color, or by any combination of these factors.

Meaning of trademarks. The requirement of truth in trademark labeling occasionally imposes the burden of finding out what certain words mean; the issue may be, for instance, whether a term such as “English lavender” or “farmer bread” denotes true origin or merely a type of product.

Proprietary or generic name. Names, originally protected as brand designations, lose their proprietary character if they have in fact become generic terms, designating the type of product rather than one of its brands. In the United States, “Thermos” has become a generic term in this fashion. “Vaseline” has lost its proprietary character in some European countries but not in the United States, where it is a specific brand of petroleum jelly.

Misleading advertising. The Federal Trade Commission and the Food and Drug Administration in the United States have the duty to prohibit false advertising claims. In such procedures two questions arise, one factual—what the product actually does—and one psychological—what the public, judging from the advertising claims, thinks it does. Thus, it was litigated whether coal made from corncobs can claim to be charcoal. The issue involved both a chemical question—whether wood charcoal is different from corncob charcoal—and a psychological one—what the public understood charcoal to be and whether the difference, if perceived, mattered.

reparation of legal surveys. There is no need here to discuss the general rules of procedure that must be observed in the preparation of surveys for purposes of legal evidence. But there are several problems peculiar to legal surveys that deserve mention. They derive from a variety of sources: from issues of law that cannot be anticipated with precision because they may be decided only during the very trial for which the evidence is prepared; from the peculiarly strict requirements of legal proof; and, unless it is a “state of mind” survey, from the hearsay rule.

Problems generated by the hearsay rule. At times, choosing a different survey design may circumvent the hearsay rule. When the geographic range of patrons of a drive-in movie theater was at issue, the data were obtained not through interviews with the patrons but through recordings and subsequent tracing of the license numbers of the parked automobiles. In this way the field workers who jotted down the numbers remained competent witnesses even under the hearsay rule.

Occasionally a court may offer to remedy the hearsay defect by allowing the survey evidence to be verified through the testimony of some of the original interviewees. This is a futile and, one may hope, passing remedy. But the dilemma it presents is real. In order to verify the fact of interviewing, the names of the interviewees may have to be presented in court, and there is then no way of protecting these interviewees from being subpoenaed as witnesses. This raises a serious problem of interviewing ethics, since survey interviewees are implicitly reassured of the privileged nature of their answers. If it became widely known that such protection cannot be guaranteed, people might decide to refuse cooperation in surveys. A similar problem, ironically, has arisen for the census itself, which, by law, is a privileged communication. One possible way, incidentally, of insuring privacy to survey respondents is to detach the interviewees' names from their questionnaires, thus making it possible to present their identity to the court but making it impossible to connect any individual with his specific questionnaire answers. This procedure, however, renders cross-examination of survey respondents almost useless, since there is no way of confronting their court testimony with their survey response.

Problems generated by proof requirements . The peculiar requirements of legal evidence affect the preparation of sampling surveys in several ways. There is, first, the prospect of a double scrutiny by opposing counsel. The first scrutiny occurs when the admissibility of the particular piece of evidence is debated; at this stage, the opposing side will try to prove that it is on its face irrelevant to the litigated issue, or that it has such obvious technical flaws that the court would be well advised to refuse its admittance. If the offered evidence overcomes this hurdle, its probative value is then explored in even greater detail through cross-examination like that of any witness.

This double scrutiny is often more exacting than it deserves to be. The discovery of but one serious flaw may endanger the entire piece of evidence. The doctrine of falsus in uno, falsus in omnibus is sometimes used to excuse dismissal of a witness' entire testimony if it is found to be untrue in a single instance; and by way of analogy, it may be applied to the witness who presents survey evidence.

This witness, therefore, should always be an expert witness, able to defend the evidence and able to explain the meaning of technical terms, such as “chi-square test,” and answer such everrecurring questions as why the sampling error hardly ever depends on the proportion the sample constitutes of its universe, but only on the absolute size of the sample.

Occasionally, the dangerous pretense is made that survey findings are the results of simple, common-sense procedures whose validity can be appraised without special expertise by any judge or jury. It is essential that both sides have experts; if one side refuses to have one, it does so at its peril. This holds even for the rare, if desirable, situation in which the statistical evidence is prepared by an expert whom the litigants and the court jointly appoint.

The strict requirements of legal proof make it essential that the chain of statistical inferences be meticulously documented. The definition of the universe, the details of the sample design, the details of the sample selection procedure, the communications to the interviewers, their control in the field, and, finally, the analysis of their reports— all should be documented by the respective research instruments and, if necessary, by ad hoc working memoranda.

Since the interviewees themselves might have some interest in the litigated issue, its nature—if possible, its very existence—should not be divulged. The safest way of doing this is to keep even the interviewers from learning the purpose of the survey. If this should prove unavoidable, they should certainly not learn which side in the litigation is sponsoring the survey.

Problems generated by legal uncertainties . One of the major difficulties in preparing survey evidence results from legal uncertainties that are likely to be decided only in the very trial for which the evidence is prepared. Foremost among these uncertainties is the definition of the relevant universe to be sampled. In a trademark confusion case, for instance, is it those who were purchasers of the particular brands—or of any brand of this type of product—or simply all potential customers? The proper way of solving such a problem is to sample all three universes and to tabulate the results for each separately as well as for all possible combinations.

Another uncertainty concerns the level of precision at which the survey answers will be relevant. It is sometimes impossible to focus the interviewee's attention on a particular issue without giving him some information about the issues in litigation. Thus, one often buys a more precise answer at the price of some contamination and even possible bias.

Consider the following sequence from a questionnaire designed to explore the respondent's knowledge of a certain merger:

Question: Do you recall any mergers of cement companies in this area during the last two or three years?

[If no reference is made to the litigated merger, ask:]

Question: Did you know that the XX-Corporation merged with another company?

[If the answer is yes, ask:]

Question: Do you happen to know the name of that other company?

Since it is difficult to predict which level of precision the court will accept as relevant, the rule must be to begin with the uncontaminated, unaided questions and to make sure that the contamination —if it is necessary—is introduced as late as possible, so that whatever answers were obtained before remain clean. At the very end of the interview even leading questions may be proper, provided their character is openly admitted, just as such questions in cross-examination in court sometimes have their justification.

Then there is the issue of realism, of the significance of merely verbal response. To avoid objections, the real problem situation should be simulated as closely as possible. For instance, instead of taking a housewife's word that if the price differential between two brands reached a certain level, she would switch, the following design was employed in a survey: One sample of housewives was given a choice of buying either brand A or brand B at the same price; a comparable sample was given the choice of A, or B at 1 cent less; a third sample the choice of A, or B at 2 cents less, and so forth. To ensure an actual purchase, the housewife was promised a gift, slightly but clearly more valuable than the price of the purchased merchandise, if she completed the experiment. Actually, the purchase price was also returned to the housewife after she had made the test purchase. In another litigation, where the types of channels used for transmitting telegrams to various places overseas were at issue, the survey maker simply sent a number of real telegrams and produced their overseas receipts in court.

Finally, there is the very special problem that arises in surveys that are to measure confusion. Confusion usually has several dimensions, and the research designed to measure it must consider all of them. In a study designed to measure confusion about trademarks, for example, there is, first, the confusion that will arise in the minds of people who simply do not know what the essential—that is, the protected—features of a trademark are. These people may confuse the two even if they note their difference. Second, there is the optical confusion that results from failure to notice a difference that is obviously too small. And, third, there is the “normal” confusion that must be discounted in the research design: the confusion that will prevail among the fringe of particularly inattentive people, who will register confusion even if there is not the slightest ground for it.

Statistics on validity of proof. This article has been limited to statistical evidence in a narrow sense of the term, that is, mostly sample surveys. But statistical questions may arise with respect to other forms of legal proof, such as blood tests for the establishment of paternity (Ross 1958, p. 466; Steinhaus 1954; Łukaszewicz 1955), lie detector tests (Levitt 1955, p. 440), identification of handwriting (Levin 1956, pp. 632, 637), or with respect to psychiatric diagnosis introduced as evidence (Schmidt & Fonda 1956, p. 262; Ash 1949, p. 272). All of these procedures are far from infallible, and efforts have been made to measure their fallibility in statistical terms. Psychologists, moreover, beginning with Munsterberg (1908), have been occupied with the problem of reliability of observation and testimony. They have accumulated a great amount of statistics on the difficulties of correctly observing moving objects or quickly developing scenes, of correctly identifying voices, on the reliability of children's testimony, or even on the reliability of psychiatric diagnosis (Marston 1924; Hutchins & Slesinger 1928; Gardner 1933, pp. 391, 407; Messerschmidt 1933, p. 422; McGehee 1937, p. 249). What these studies have in common is that they are statistical evidence once removed from the courts: they contain statistics on legal evidence, hardly ever evidence itself.

Hans Zeisel

[See alsoLegal reasoning; Psychiatry, article on Forensic psychiatry; Sample surveys.]


