Content Analysis
Content Analysis
Examples of uses of content analysis
Content analysis is used in the social sciences as one means of studying communication—its nature, its underlying meanings, its dynamic processes, and the people who are engaged in talking, writing, or conveying meaning to one another. Although not a research method sui generis, content analysis is roughly distinguishable from other methods by two characteristics. First, its data—in contrast to ethnographic reports, for example, or census enumerations—are the verbal or other symbols which make up the content of communications (letters, books, sermons, conversations, television programs, therapeutic sessions, paintings, and the like). Second, its procedures differ in emphasis from those of the historian or literary critic: they aim to be exact and repeatable, to minimize any vagueness or bias resulting from the judgments of a single investigator. Thus, each content analysis employs an explicit, organized plan for assembling the data, classifying or quantifying them to measure the concepts under study, examining their patterns and interrelationships, and interpreting the findings.
Within these broad limits the techniques of content analysis are diverse, and the objectives range from mapping propaganda campaigns, for example, to explaining international conflict and integration; from abstracting the ideas and beliefs expressed in folklore or movies of a given period to tracing the epochal alternations in societal values over many centuries; from charting the interaction between patient and therapist to assessing the psychological states of great men in the past.
No general theory of communication is yet in common use among the several social sciences to guide these varied analyses. Implicit in each investigation is a special conceptual model, or set of ideas and assumptions, about the nature of the particular communication process under study. To test this conceptual model or to add new ideas to it, the researcher uses the concrete data of communication. In the empirical phase of the research he is led by his model to select particular communications and to search for order among them by adapting certain conventional procedures of sampling, measurement, and analysis. In the interpretative phase, in comparing his findings with his initial conceptions, so as to understand their broader significance, he encounters certain special problems and possibilities.
Historical background
The use of content analysis in the social sciences today—its methods and its problems of interpretation—has been affected both by related developments in other fields and by historical demands for certain practical applications. Early in the twentieth century students of journalism began to count the newspaper linage devoted to foreign affairs or to sports, comparing one newspaper with another, for example, and later comparing newspaper content with the content of other media. In literary criticism such devices as type of rhyme or ratio of adjectives to verbs were tabulated, as a means of differentiating the styles of writers or of literary periods or to settle disputes about authorship or the chronology of an author's work. Meanwhile, educators were constructing formulas for the readability of printed materials, utilizing proportions of easy and hard words, length of sentences, and the like.
During the 1930s certain applications of such techniques began to be made in the social sciences. Sorokin's monumental study of social and cultural changes in western Europe over the entire course of history rests in part upon an analysis of works of art, music, literature, and philosophy according to their central meanings (1937–1941). Lasswell developed a scheme for categorizing the content of patients' responses in psychiatric interviews as pro-self, anti-self, pro-other, or anti-other, and for counting the frequency with which such categories occurred (1938). Lasswell also, with a number of associates, pioneered the application of content analysis to the study of public opinion and propaganda, an effort immensely stimulated by the demands of the United States government during World War II. Mass communication was conceptualized, within a political framework, as “who says what to whom, how, with what effect,” and large-scale analyses were made, for example, of the frequency with which key symbols (democracy, communism, England, Hitler) were given indulgent, deprecatory, or neutral presentation (e.g., Lasswell & Leites 1949). These wartime efforts encouraged content analyses in other areas—focused on the intentions of particular communicators, the kinds of material brought to the attention of particular audiences, or the cultural values underlying the communicator's assessment of what the audience wants.
When Berelson (1952) made his critical survey of the applications of content analysis methods, he found several books and articles reporting the use of various techniques, e.g., techniques of sampling the content of newspapers by successively selecting specific newspapers, issues, and relevant content within each issue; techniques of categorizing and counting key words, themes, or whole documents; techniques of increasing the reliability of classifying and counting. To guide the application of such techniques, however, Berelson found only one conceptual model in widespread use: the Lasswellian model of the purposive one-way communication intended to influence a mass audience on controversial public issues (Berelson 1952, p. 57 and passim). Thus, broader scientific utilization of the available techniques waited upon a growing interdisciplinary understanding of the many-faceted communication process and upon the closer fitting of techniques to theory.
Examples of uses of content analysis
Important developments in the application of content analysis to social science models may be illustrated by a few examples from the profuse literature of the 1950s and 1960s (see also Work Conference on Content Analysis 1959).
Interaction process
Bales and his associates have developed one of several procedures for analyzing the content of communication observed in small groups (e.g., Bales 1952). Observers sitting behind a one-way screen categorize each of the remarks and gestures (acts) directed by each group member to other members as the group attempts to solve an assigned problem. Bales's standard set of 12 categories (shows solidarity, shows tension release, agrees, gives suggestions, etc.) indexes certain sociological properties of the interaction of a group: positive or negative direction, instrumental or expressive character, and the focus on such system problems as control, tension management, or integration. Thus categorized, data from many groups are used (with the aid of statistical devices and mathematical models) to describe the group process—the patterning of content, phasing over time, group structure. From these descriptions inferences are drawn about the underlying nature of this process. For example, the findings may show that typically a group leader emerges who both initiates and receives more communications than any other member; or that the process of problem solving goes through phases emphasizing, first, orientation; then, evaluation; and, finally, control [seeInteraction, article oninteraction process analysis].
Studies of therapy. Such analyses of the content of interaction, made at the time of observation or, later, through recordings or transcripts, are also applied to the therapeutic process in social work, counseling, and psychiatry (e.g., the review by Auld & Murray 1955). Category systems based on psychological theories of behavior combined with principles of client-centered therapy are used to trace the interview process, the client-therapist relationship, changes in predominant content over time, or differences between types of treatments. Standard measures employed in content analysis of psychotherapy include Bales's categories (1952) and the Discomfort-Relief Quotient (D.R.Q.), developed by John Dollard and O. H. Mowrer (see, e.g., Auld & Murray 1955, pp. 379–380) to show the ratio between the client's discomfort responses (reflecting tension, unhappiness, pain) and his relief responses (reflecting satisfaction, comfort, enjoyment). In Japan (Shiso … 1959) content analysis has been applied to the exchanges of letters published in life-counseling columns of newspapers and magazines. [SeeMental disorders, treatment of, article onclient-centered counseling.]
Psychological state of the communicator
Analysis of an individual's communications as an index of his underlying motives is exemplified in a study of suicide letters by Osgood and Walker (1959). Within the framework of stimulus-response theory these researchers formulated a number of predictions about the structure and content of suicide letters in contrast to ordinary letters and to simulated suicide notes: the letters of bona fide suicides are characterized, for example, by greater stereotypy; more evidences of conflict; more construetions of the demand, command, and request type that express needs of the speaker and require some reaction from another person to satisfy these needs.
To test these predictions, the researchers use 16 different measures—some already standard and some specially designed—for the analysis and comparison of the letters. As measures of the stereotypy of each letter, for instance, they divide the number of different words by the total words, count repetitions of phrases, or take the ratio of nouns and verbs to the number of adjectives and adverbs. To measure conflict, they determine the degree to which assertions are qualified, the number of syntactical constructions expressing ambivalence (such as “but,” “if,” “however”), or the extent to which both positive and negative assertions are combined in the same letter. They also employ Osgood's evaluative assertion analysis, a standard procedure that isolates from their context all terms by which the communicator evaluates an object, rates these terms as favorable or unfavorable, and then combines these ratings to index the communicator's over-all attitude toward the object (see, e.g., Work Conference on Content Analysis 1959, pp. 41–54). Comparisons of such measures for suicide letters and ordinary letters support the researchers' hypotheses in most instances, although the results for suicide versus simulated suicide notes are less clear.
Historical personalities. In similar fashion, content analysis seems potentially useful to the historian or biographer who is seeking to understand the personalities of great men through their writings and speeches. For solving questions about authorship of documents, rigorous procedures have been developed (as in the study of the disputed Federalist papers: Mosteller & Wallace 1964). For examining motives, attitudes, and psychological states of historical persons, however, content analysis of their communications has been less widely used, despite suggestive studies of Goebbel's diary or the autobiography of Richard Wright (see Garraty in Work Conference on Content Analysis 1959, pp. 171–187).
Culture and society
In their study of popular religion, Schneider and Dornbusch (1958) illustrate the use of content analysis to reflect, not the psychological states of single persons, but the values of an entire society. These researchers selected 46 representative works of American inspirational literature, published over an 80-year period, choosing best sellers to assure that the books were read. They classified each, paragraph by paragraph, according to its main themes (”religion brings physical health,” “happiness can be expected by most men”) and then ascertained what proportion of the total paragraphs is devoted to each theme. In their report they described the changes in these themes within the wider context of historical trends in American religion and culture and used a socio-logical model to interpret the functions of popular religion in society.
A considerable tradition of such analyses rests on the assumption that cultural values which have been institutionalized in certain segments of the society are represented in the communications of individuals from these segments. Some analyses stress the social determinants of the ideas or values expressed in folklore and sermons, for example; some emphasize the cultural determinants of such expressions. Thus, public communications of American business leaders are taken to reflect their business creed; Brazil's riddles and myths, to reflect the didactic aspects of its religion; Japan's popular songs, to reflect the loneliness and helplessness of its postwar era (Shiso … 1959, p. 122).
Empirical methods
Content analyses are conducted by selecting and adapting certain empirical procedures used in social science generally: typically, the methods of using available data (although new materials for special objectives may occasionally be acquired by questioning or observing) and the methods of measurement combined with sampling and statistical analysis.
Use of available data
Most commonly, the content analyst chooses from the vast store of communications already available in libraries, clinics, archives, records, and family attics. Thus, he must know how to utilize the benefits of available data, while avoiding their pitfalls.
Advantages. Several advantages accrue to the student of communication who decides to use materials that already exist rather than to elicit new ones. (1) Time, labor, and expense can often be saved when the researcher can go directly to the heart of his analysis, bypassing preliminary field work, experimentation, or commissioning of documents. (2) When massive data are required, beyond the scope of a single new study, existing content materials frequently afford wide ranges of potentially relevant variables and of refinement in the measurement of each variable. (3) Most important, the available data afford the only means of studying certain kinds of communication problems. Past events cannot be observed directly by the re-searcher, nor can events beyond the recollection of respondents living today be reached through questioning. Thus, the analysis of historical situations or of long-term trends—the important study of social change—depends upon the prior existence of relevant materials. Similarly, study of cross-cultural communications from remote places (e.g., of world-wide tastes in movies or folklore or of similarities and differences in attention to major political symbols in different countries) may require materials that cannot be elicited by the researcher directly. Communication contents in technical fields that are beyond the competence of the researcher may have been originally assembled in usable form by an expert such as a psychiatrist, a social worker, or an ethnographer. Sometimes, as in letters or diaries, existing materials may provide deep insights into intimate feelings or personal relationships; and sometimes, as in Sorokin's analysis, they may widen the investigator's focus to include macroscopic social or cultural systems.
Pitfalls. Against such impressive assets must be set certain basic problems to be overcome in the utilization of data not originally assembled for the present purposes. (1) The materials are often incomplete. The content analyst must attempt to discover any absences of letters from a file of correspondence or of speeches from a set, which may mean that the data lack representativeness. (2) The data may lack reliability or validity. An isolated record of a historical event, for example, cannot be checked through comparison of different accounts or through direct observation or questioning by the researcher. Clues to validity can often be obtained, however, by comparing two sets of data believed to reflect the same concept, as does Sorokin (1937–1941) when he shows the parallelism between trends in scientific discoveries and the trends in empirical thought derived from content analysis. (3) Data from differing sociotemporal contexts may not be directly comparable, as sources of information may themselves change over time or from one country to another, or the same categories may take on different meanings. This difficulty requires careful documentation and the search for linguistic equivalences. (4) Finally, the data that come to the researcher in a form he does not fully understand may not fit his definitions of the concepts under scrutiny. Unlike the researcher who handles data he himself has collected, he is often unfamiliar with the circumstances under which the communications originally took place. Yet the content of a diary may depend upon whether it was written for public or private consumption, and the answer to an open question may be affected by interviewer bias. Here the important caveat is to attempt to reconstruct the process by which the data were produced, spelling out and, insofar as possible, offsetting any limitations and biases and recasting the data in a form suitable for the new problem.
Although the researcher may on occasion have to reject given data because he cannot adequately assess their limitations or find suitable means of compensating for them, the great variety of available data which may in some sense be classified as communications constitutes a highly valuable re-source for the further application of content analysis.
Use of measurement
The content analyst makes use of his data to measure his concepts, rather than to describe them in discursive language. His data consist of certain concrete communications of certain concrete individuals (the cases). His conceptual model contains corresponding definitions of particular types of orientations, actions, or characteristics (the properties) of particular types of persons or collectivities. What he does, in effect, is to treat the sense data (the written or spoken words, the gestures or pictures which he observes as manifestations or indicants of these properties (the ideas which he holds in his mind). Measurement is defined here, then, as the classification of cases (persons, groups) in terms of a given property, according to some rules for selecting and combining appropriate communications data as indicants.
Composite measures. The measurement rules followed in content analysis vary in detail with the study; but in general they are characterized by a two-stage procedure that results in a composite, rather than a simple, measure. The researcher does not simply classify (code) each case as a whole. Rather, he breaks down the total communication into a set of constituent units (e.g., words, assertions, articles, books); he first codes each of these content units separately, and then he recombines the coded units to provide the composite measure. Bales, for example, in classifying his cases (groups) according to various dimensions of interaction, might well have observed an entire small-group session and then assigned over-all ratings (simple measures) to indicate the extent to which, for example, solidarity was expressed or tension-management activities had occurred. Instead, he broke down the property (interaction) into small content units (acts), categorized the behavior act by act, and then counted the number of acts in each category. This composite measure gives a group profile —a distribution of the total number of acts among code categories—by which groups are classified according to the extent to which members show agreement, engage in tension-management activities, and so on.
Coding. At the first stage the coding process involves a measuring instrument for assigning to each content unit certain code designations that indicate how much of (or which attributes of) the property it possesses. This instrument consists of (1) a code, or set of code designations. Made up of numerals, symbols, or names of categories, the code lists all the categories marked off on each dimension of each property. (Properties are conceived of as having one or more main dimensions, or aspects; for multidimensional measures, the measures of single dimensions—whether simple or composite—must ultimately be combined to reflect the property as a whole.) The instrument further contains (2) coding instructions, which, on the one hand, define each dimension and its categories in terms of the conceptual model and, on the other hand, specify the kinds of data to be taken as indicants under each category. The coding instrument for a particular study is sometimes taken from an existing body of theory (such as Riesman's “inner-direction” versus “other-direction”); it may be a standard code developed by other researchers (such as the D.R.Q. or the verb-adjective ratio); or it may be developed from the empirical data of the study.
Combining. At the second stage—combining the content units to refer to the communication as a whole—the content analyst may simply count the number of units in each category (e.g., to show the number of favorable and unfavorable assertions or to arrive at the mean percentage of paragraphs devoted to dogma in a sample of religious books). Such frequency counts of similar units have the effect of weighting the category to show how predominant or pervasive that category is within the communication as a whole. Sometimes the units are given equal weight (e.g., Bales 1952); or different weights may be assigned, e.g., for different degrees of attitude intensity, as assessed by judges, or for differing degrees of impact upon an audience; Sorokin (1937–1941) weighted the influence of great thinkers according to the number of special monographs devoted to each.
Alternatively, the content analyst may adapt various available procedures to uncover the empirical patterning among different types of units, to show how specific acts, attitudes, or characteristics may fit together within a single communication process. For instance, factor analysis may be used to isolate broad dimensions (as of mental health attitudes expressed in mass media or of sensationalism in the handling of news), or Guttman scaling may be used to uncover cumulative patterns (as of the various duties and functions assigned in state laws to boards of education). Such procedures have the virtue of containing built-in tests of the correspondence between the researcher's conception of the property and the rules that he follows in making his measurements. These tests obviate the necessity of relying entirely upon the investigator's arbitrary judgment for the combining and weighting of indicants.
Such patterning is further disclosed, of course, when the investigator, having completed his measurement of single properties, proceeds to examine the positive or negative correlations between properties. References to the devil and to writing may be found to go together in certain folk tales; or a psychotherapy patient may tend to dissociate his thoughts of mother from his thoughts of homosexuality (e.g., Osgood's contingency analysis in Work Conference in Content Analysis 1959, pp. 54–78). Some content analysts apply statistical tests to estimate the likelihood that such correlations are due entirely to chance, although there are often problems of appropriateness of the particular tests (as when the several communications of selected individuals may not meet the assumptions of statistical independence).
Utility of composite measurement. The characteristic two-stage measurement procedure of first coding and then combining content units often enhances the precision possible in simple over-all measurement, while intercoder reliability is reportedly high. Coding rules can be defined more specifically and coding decisions made more easily for a small content unit than for an entire communication (see Schneider & Dornbusch 1958, pp. 165–169, for a comparison of global ratings of entire books with paragraph-by-paragraph ratings of the same books). Although the detailed procedure is typically more time consuming and laborious, computer programs will without doubt be increasingly used to expedite a number of these operations (e.g., the General Inquirer system for content analysis, Stone et al. 1962). Mathematically, the composite measure is quantitative, a characteristic which often facilitates its use in relation to other measures. Even though each act or unit of meaning may be coded on a nominal scale (described in words as favorable or unfavorable, showing or not showing solidarity, etc.) at the first stage of the procedure, at the second stage, when all these codes are combined, the resultant measure consists of numbers or proportions (of remarks that are favorable or of group activities that show solidarity). Such numerical data are used to classify the individuals or groups along scales that are at least at the ordinal level.
Nevertheless, the quantitative, precise appearance of many composite measures used in content analysis may be deceptive. Without a clear correspondence between measurement operations and the communication process, serious problems of interpretation arise.
Sampling. Just as content analysis requires measurement, it also requires rigorous procedures for sampling. The procedures generally employed by social scientists refer to the selection of both the concrete cases to be studied (the communicating persons or groups) and the communications to be used as indicants. Some content analyses deal with only a single case (e.g., Wilson as a single historical figure or western Europe as a single society). When many cases are studied, so as to separate common properties from those peculiar to exceptional cases, samples are often chosen by standard probability procedures that aim to represent the conceptual universe through the sample selected. A second important aim is to select a sample of cases that will facilitate the analysis— as Osgood chooses samples for comparative analysis of ordinary persons and suicides.
Similar sampling procedures are applied to the determination of which communications will be examined, since it is by no means always necessary to analyze all the writings of a given man, all the meetings of a given group, or all the propaganda of a given country. Selections are often made by stratifying or classifying the major items, such as books, prayers, pictographs, records of single meetings, paragraphs, and then taking a probability sample from each stratum.
Interpretations
Just as each piece of content analysis uses certain empirical methods to arrive at its findings, it also employs procedures for interpreting the scientific and theoretical significance of the findings by comparing them with the conceptual model. The methods for arriving at such interpretations have been less clearly codified than the empirical methods —many of which have been taken over wholesale from other applications—and there is much discussion and some confusion about the kinds of inferences which are appropriate or valid. Nevertheless, the accumulating body of interpretations derived is now beginning to explain the relationships between communicators, recipients, and the patterned content of the communications themselves. These interpretations shed light on historical changes and dynamic processes of communicative interaction. They often go behind the meanings of the language to the underlying social structure of the group or the psychological state of the individual. Content analysis may show, for example, that—quite apart from the content of communication—in a task group a leader tends to emerge who initiates and receives about half the communication. Or such nonlexical aspects of speech as stuttering or hesitation may reveal the anxiety of a patient in an interview.
Exploration
The content analyst whose main objective is exploratory makes his interpretations by working primarily from data to model, adding new ideas to his theories after he has completed the empirical phase of the research. Here the special character of the composite measurement procedure can be a notable asset. The careful handling of details and the search for patterning among them often serves to clarify the concepts with which the inquiry began, and to uncover latent relationships and processes not immediately apparent to an investigator. Thus, the sociologist can reveal the balance between instrumental communications, through which a group may pursue its goals but which place strains upon its members, and expressive communications, which ease such strains and tend to re-establish the equilibrium (Bales 1952). Or the historian, through composite measures based on subject matter and key words in the clauses of the Grand Remonstrance, enacted by the British Parliament in the seventeenth century, can expose its character as a propaganda vehicle rather than a constitutionally important document (Knight 1960). In such exploration the methods of interpretation, though rarely explicit, require creative effort —a jump from evidence to ideas, a sensitivity to potential linkages between empirical clues and existing theory and knowledge.
Inadequate use of theory
The fruitless character of content analysis without careful reference to adequate theory is, unfortunately, all too often overlooked. Complex techniques of measurement and analysis may be applied blindly, without questioning their theoretical relevance. Content may be arbitrarily broken down into units that distort messages by wrenching them from their setting. One-to-one inferences may be drawn, from content descriptions to states of the communicator or his social system, without recognition or assessment of the isomorphism implied (as discussed in George 1959). Little consideration may be paid to the meaning of a particular frequency count, as this might refer to the intensity of an individual's attitude or to the consensus with which several individuals hold the same attitude or to the calculated impact of repetition upon an audience. Yet such oversights in connecting techniques with theory can yield meaningless—even misleading—results.
Hypothesis testing. Errors of sheer empiricism are less likely when the researcher, instead of exploring, can use content analysis to test hypotheses. This is often feasible when the conceptual model is highly enough developed to suggest an interpretation in advance of the content analysis (e.g., Osgood & Walker 1959). In hypothesis testing the researcher cannot avoid an explication of the presumed relationship between theory and operations. Here he uses logical or mathematical reasoning to specify what the expected findings would be if the assumptions of the model were in accord with the facts. Again, of course, any evidence derived from testing the model can only be as good as the model itself; the importance of the evidence is bounded by the imagination and the theoretical grasp with which the research begins.
Supporting analyses. Just as the model of communication includes not only the message unit but also underlying attitudes, behavior patterns, values, and social structures, so content analysis alone cannot provide a full understanding of communication. However ideally executed and interpreted and however widely replicated, the approach must often be supplemented by other approaches, which focus more widely upon the several aspects of the communication process. Thus, precise estimates of the intended meaning of a communication depend on knowledge of the situational and behavioral con-text (George 1959); the content of therapy cannot be assessed without an outside measure of the recovery of the patient; the interpretation of a suicide letter requires the identification of the writer as actually suicidal or not; the presumed appeals of propaganda or advertising must be checked against the responses of the audience. The very ability of the language to communicate must often be tested —for instance, by Taylor's “cloze” procedure, in which sample recipients are given a message in mutilated form and asked how far they can reconstruct it (see Work Conference on Content Analysis 1959, pp. 78–88).
A full understanding of communication will rest ultimately, of course, upon accumulation of ideas and facts from many related studies. Among these, the findings of content analysis can make a special contribution because of their objectivity. The content analysis of letters by Osgood and Walker, for example, is more open to evaluation and replication by other scholars than the less systematic handling of Polish peasant correspondence by Thomas and Znaniecki; the content analysis by Schneider and Dornbusch is more open than Max Weber's insightful construction of ideal types from the writings of a Benjamin Franklin or a Jonathan Edwards.
Matilda White Riley and Clarice S. Stoll
[Other relevant material may be found inFactor analysisandScaling.]
BIBLIOGRAPHY
Auld, Frank Jr.; and Murray, Edward J. 1955 Content Analysis Studies of Psychotherapy. Psychological Bulletin 52:377–395.
Bales, Robert F. (1952) 1963 Some Uniformities of Behavior in Small Social Systems. Pages 98–III in Matilda White Riley, Sociological Research. New York: Harcourt.
Berelson, Bernard 1952 Content Analysis in Communication Research. Glencoe, III.: Free Press.
George, Alexander L. 1959 Propaganda Analysis: A Study of Inferences Made From Nazi Propaganda in World War II. Evanston, III.: Row, Peterson.
Knight, Oliver 1960 The Grand Remonstrance. Public Opinion Quarterly 24:77–84.
Lasswell, Harold D. 1938 A Provisional Classification of Symbol Data. Psychiatry 1:197–204.
Lasswell, Harold D.; and Leites, Nathan 1949 Language of Politics: Studies in Quantitative Semantics. New York: Stewart.
Mosteller, Frederick; and Wallace, David L. 1964 Inference and Disputed Authorship: The Federalist. Reading, Mass.: Addison-Wesley.
Osgood, Charles E.; and Walker, Evelyn G. 1959 Motivation and Language Behavior: A Content Analysis of Suicide Notes. Journal of Abnormal and Social Psychology 59:58–67.
Schneider, Louis; and Dornbusch, Sanford M. 1958 Popular Religion: Inspirational Books in America. Univ. of Chicago Press.
Shiso no Kagaku KenkyŪkai 1959 Japanese Popular Culture: Studies in Mass Communication and Cultural Change Made at the Institute of Science of Thought, Japan. Edited and translated by Hidetoshi Kato. Rutland, Vt.: Tuttle.
Sorokin, Pitirim A. (1937–1941) 1962 Social and Cultural Dynamics. 4 vols. Englewood Cliffs, N.J.: Bedminster Press. → Volume 1: Fluctuation of Forms of Art. Volume 2: Fluctuation of Systems of Truth, Ethics, and Law. Volume 3: Fluctuation of Social Relationships, War, and Revolution. Volume 4: Basic Problems, Principles, and Methods. See especially Volume 1 and Volume 2.
Stone, Philip J. et al. 1962 The General Inquirer: A Computer System for Content Analysis and Retrieval Based on the Sentence as a Unit of Information. Behavioral Science 7:484–498.
Work Conference on Content Analysis, Monticello, Ill., 1955 1959 Trends in Content Analysis: Papers. Edited by Ithiel de Sola Pool. Urbana: Univ. of Illinois Press.
Content Analysis
CONTENT ANALYSIS
"Content analysis" has evolved into an umbrella label that includes various procedures for making reliable, valid inferences from qualitative data, including text, speech, and images. These procedures have improved and expanded due to numerous developments in recent years since this encyclopedia's first edition.
Traditionally, "content analysis" has referred to systematic procedures for assigning prespecified codes to text, such as interviews, newspaper editorials, open-ended survey answers, or focus-group transcripts, and then analyzing patterns in the codings. Some projects will count each specific occurrence within a text, while others will have coders tally the number of column inches assigned a code. Either way, the procedure usually employs a "top down" strategy, beginning with a theory and hypotheses to be tested, developing reliable coding categories, applying these to coding-specified bodies of text, and finally testing the hypotheses by statistically comparing code indexes across documents.
With the increasing popularity of qualitative sociology, content analysis has also come to refer to "grounded" inductive procedures for identifying patterns in various kinds of qualitative data including text, illustrations, and videos. For example, the data might include observers' detailed notes of children's behaviors under different forms of supervision, possibly supplemented with videotapes of those same behaviors. While traditional content analysis usually enlisted statistical analyses to test hypotheses, many of these researchers do not start with hypotheses, but carefully search for patterns in their data.
However, rather than just produce statistical analyses or search for patterns, investigators should also situate the results of a content analysis in terms of the contexts in which the documents were produced. A content-analysis comparison of letters to stockholders, for example, should take into consideration the particular business sectors covered and the prevailing economic climates in which they were written. An analysis of American presidential nomination acceptance speeches should consider that they changed dramatically in form once they started to be broadcast live on national radio. A content analysis may have reliable coding, but the inferences drawn from that coding may have little validity unless the researcher factors in such shaping forces.
Like any expanding domain, there has been a tendency for content analysis to segment into specialized topics. For example, Roberts (1997) focuses on drawing statistical inferences from text, including Carley's networking strategies and Gottschalk's clinical diagnostic tools. There also has been a stream of instructional books, including several series published by Sage, that focus on particular kinds of qualitative data such as focus-group transcripts. A technical literature also has developed addressing specialized computer software and video-analysis techniques. Nevertheless the common agenda is analyzing the content of qualitative data. Inasmuch as our lives are shaped by different forms of media, and inasmuch as different analytic procedures can complement one another in uncovering important insights, it makes sense to strive toward an integration, rather than fragmentation.
Many advances in content-analysis procedures have been made possible by the convenience and power of desktop and laptop computers. In addition, an overwhelming proportion of text documents are now generated on computers, making their text files computer accessible for content analysis. And a revolution in hand-held analogue and digital video cameras, together with computer-based technology for editing and analyzing videotapes, makes new research procedures feasible.
Consider, for example, new possibilities for analyzing responses to open-ended questions in survey research. For years, survey researchers have been well aware that closed-ended questions require respondents to frame how they think about an issue in terms of a question's multiple choices, even when the choice options had little to do with how a respondent views an issue. But the costs and time involved in analyzing open-ended responses resulted in such questions rarely being used. Even when they were included in a survey, the interviewers usually just recorded capsule summaries of the responses that omitted most nuances of what was said.
Contrast this then with survey research using today's audio information-capturing technologies. Telephone survey interviewers are guided by instructions appearing on a computer screen. Whenever an open-ended question appears, the interviewer no longer needs to type short summaries of the responses. Instead, a computer digitally captures an audio recording of each open-ended response, labels it, and files it as a computer record. Any audio response can later be easily fetched and replayed, allowing a researcher, for example, to identify a "leaky voice," that is, one indicative of the respondent's underlying emotion or attitude. And the full audio responses are then available to be transcribed to text, including, if desired, notations indicating hesitations and voice inflections. Until computer voice recognition is completely reliable, transcribing usually remains a manual task. But with spreadsheet software (such as Excel) no longer restrictively limiting the amount of text in any one cell, the text of each individual's entire response to a question can be placed in a spreadsheet cell, thus capturing both the closed-ended and open-ended data for a survey into a convenient, single spreadsheet for researchers to analyze.
With the survey data in this convenient form, researchers can then code open-ended responses manually, putting their assigned codes in additional spreadsheet columns. As a teaching exercise, it is instructive to assign students a task such as identifying gender differences among a thousand responses to a broad open-ended question, such as a question asking respondents' views about peace or family values. Students first might sort the spreadsheet by gender in order to read separately samples of male responses and female responses and obtain a sense of what possible gender differences exist. They then develop coding instructions that capture these differences and apply the codes to the entire set of responses. This coding, of course, is better done without knowledge of the respondents' genders, with responses in a random order, and on different respondents than those used to develop the codes. After coding several hundred responses, however, students usually begin to glaze over and soon the most ardent humanist student is asking whether the computer could possibly be of help in assigning codes.
For some kinds of coding, computer help is indeed available in the form of computer programs that assign codes. Such codings can be treated as advisory and then manually confirmed, augmented perhaps by also assigning a weight. Or they may be used as is after being spot-checked for accuracy. Not only can a computer complete huge amounts of tedious coding in minutes, possibly assigning many different types of codings to each text, but these codings may uncover statistically significant frequency differences that human coders would not uncover, if only because computer analysis is so even-handed and untiring. The static created by occasional miscodings may be more than offset by gains from a reliable consistency in making many codings.
Computer coding assignments are usually based on the occurrence of words, particular senses of words, or multiword idioms appearing in the text. For example, the word "father" in a text might be coded as "male," "family member," etc. as well as possibly "authority-role." Computer content-analysis software may search the contexts of words in the text to ferret out and correctly code common word senses. For example, for a national study of people's perceptions of African-American young males on several open-ended questions, it was particularly important for the computer to identify correctly each respondent's usages of such multi-meaning words as "race," "white," "black," and "color."
In addition to developing their own coding categories, researchers may enlist existing computer-scored categories that are relevant to the task at hand. For example, it might be hypothesized that one group being studied is more optimistic and its responses will reflect more "positive thinking" while another is more negative or pessimistic. To code "positive thinking" a researcher may want the computer to apply an existing content-analysis category that includes over 1200 words, word roots, word senses, phrasal verbs, and idioms, thus essentially covering most expressions of "positive-thinking" that occur as infrequently as three times per million words of ordinary English text. A similar category exists for negative-thinking, allowing the investigator to check whether the groups being studied differ in their coded positive thinking, negative thinking, or both. And by enlisting such standard categories, the results obtained in one study can be readily compared with results found in other studies.
Once data has been captured in a convenient format for computer use, they can be repeatedly analyzed. For example, should our now glazed-over students have any energy left after analyzing the responses by gender, they could be given an additional assignment of identifying and coding rural-urban differences in these same open-ended responses. Given so many analyses that can be made, it makes sense to let the computer do what it can, saving manual labor for those types of codings that would be hard to have a computer assign. Even multimedia qualitative-analysis software such as HyperResearch includes some rudimentary tools for automatic assignments.
Moreover, desktop computer software has also become available that identifies patterns in text without having to develop coding categories. For example, SPSS's TextSmart uses an algorithm that groups respondents into clusters based on word-use co-occurrences within responses and then maps these clusters into a two-dimensional grid that uses colors to represent each cluster group. If such an automated inductive procedure can produce additional valid insights that other techniques are likely to overlook, then why not use it too?
Pioneering work in inductive automatic categorizing, such as Iker's (1969), usually enlisted procedures based upon correlation matrixes, such as factor analysis. These procedures tended not to be particularly suited to analyzing text both because of the shape of word-usage frequency distributions as well as the limited number of words that a correlation matrix could feasibly handle. TextSmart, based upon a word-distance measure, provides much more suitable solutions. Further automatic categorizing procedures may be expected from artificial intelligence, as well as from categorizing techniques being developed for Internet search engines.
Content-analysis research strategies can thus now easily be multipronged, spanning from completely automatic inductive procedures to manual coding. But even manual coding these days is likely to utilize computer software to help coders manage information. Consider these changes in costs and convenience: Unlike mainframe computing of the 1960s and 1970s, when the cost of an hour of computer time was about the same as a coder's wage for several weeks, the marginal cost of using a desktop computer is essentially the electricity it uses. Today's desktop computer is likely to be more than fivefold faster at content-analysis coding than those mainframe computers ever were. They also can access much larger dictionaries and other information in their RAM than was ever feasible on a user partition of a mainframe computer, thus making their coding more accurate and comprehensive. Moreover, a single CD-ROM full of text to be analyzed is easily popped into a desktop computer, whereas in the days of mainframe computing, a comparable amount of text would have to be keypunched on over 3,000 boxes of IBM cards.
Given today's convenience and low cost of computer-based procedures, there is no reason to limit an analysis to one approach, especially if insights gained from one approach will differ and often complement those gained from another. Instead of being limited by technology, the limits now may lie in the skills, proclivities, and comfort zones of the researchers. Research teams, rather than individual researchers, may prove the best solution, for only in a team made up of people with complementary strengths is one likely to find the full range of statistical, conceptual, intuitive, experiential, and perhaps clinical strengths needed to carry out penetrating, comprehensive content-analysis projects. Moreover, some researchers will prefer to learn from the main trends while others will learn more from studying outlying cases. Some will learn from bottom-line numbers while others will learn more from innovative graphics that highlight information patterns. Some will focus on current data while others will contextualize data historically by comparing them with data in archives. Data that has been gathered and assembled at considerable cost, especially data-gathering that imposed on many respondents, merit as thorough and comprehensive analyses as these various procedures collectively offer.
Unfortunately, however, an "either-or" assumption about how to do content analysis has continued to be supported both by books and computer software. Authors who do an excellent job of describing one approach to content analysis, such as Boyatzis (1998), give an impression that an either-or decision has to be made about which approach to use. Some software—especially that ported from mainframe computers or developed for early desktop computers—still may steer or even limit researchers who use it to just one approach. For example, some software packages create specialized data formats such as "classification trees" that then in effect constrain the user to analyses that can be readily derived from that format. Software reviews such as Lewis's (1998) excellent comparison of ATLAS/ti and NUD-IST software have been explicit about what assumptions a researcher buys into when utilizing each package.
Additional leverage in analyzing qualitative information has stemmed from computer-based tools, such as newer versions of HyperResearch and ATLAS/ti, that integrate the handling of multiple media (text, illustrations, and video). Especially as more software comes from countries where there are expert programming skills and programming labor is relatively inexpensive, we can expect ambitious content-analysis software, of which TextAnalyst from Russia (www.megaputer.com) may be a forerunner.
Given continuing content-analysis software developments, those who would like to learn what is currently available are advised search Internet sites rather than rely upon even recently published materials. One recommended starting point is the Georgia State University content-analysis site (www.gsu.edu/~wwwcom/content.html), which gives links to software web sites (including software mentioned in this article), indexes recent content-analysis publications, and has a mailing list of more than 700 members. Technical reviews of relevant language-analysis tools, such as Berleant's (1995), occasionally appear in computational linguistics journals and web sites. For training, the University of Essex Summer School in Social Science Data Analysis and Collection (www.essex.ac.uk), as part of a program of a European consortium, has been offering a content-analysis module for years as part of its program.
Given the developments described here, some of the contributions that content analysis should be able to make to sociological research include:
- A major shift from reliance upon closedended questions to an appropriate use of open-ended questions that lets people be heard in the ways they frame issues, as well as the way they think and feel about them, as discussed in detail by Stone (1997)
- A better understanding of both print and television media and its impact on public opinion, both in setting agendas and in influencing opinion intensity, as laid out in Neuman (1989). This will involve research that compares the content of media with the content of opinions. Not only will survey research data be archived and accessible from Internet servers, but full-display media will be accessible from Lexis-Nexis and on-line editions supplied by media providers, as well as television news archives such as those at Vanderbilt University
- Better use of historical qualitative data, including both text and graphic materials, to address such issues as how economic cycles impact ideology, as examined by Namenwirth and Weber (1987), or to uncover cycles of creativity, as demonstrated by Martindale (1990)
- Investigations, several of which are already underway, of both intranet communication patterns within organizations as well as Internet communications, including analyses of the content of communications over those networks.
There is also, however, good reason for caution. Never before in history has so much qualitative information been available electronically. High-volume image scanning will also further increase the amount of information that can be electronically accessed and content-analyzed. Quite understandably, those agencies responsible for limiting terrorist activities may look on content-analysis procedures as possibly providing early warnings that could save lives. But these procedures can also become tools for a "big-brother" monitoring society. Sociologists have an important role in anticipating these problems and helping resolve them.
references
Berleant, Daniel 1995 "Engineering 'Word Experts' for Word Disambiguation." Natural Language Engineering 1 (4):339–362.
Boyatzis, Richard E. 1998 Transforming Qualitative Information: Thematic Analysis and Code Development. Thousand Oaks, Calif.: Sage.
Iker, Howard, and Norman Harway 1969 "A Computer Systems Approach Toward the Recognition and Analysis of Content." In George Gerbner, Ole Hosti, Klaus Krippendorff, William Paisley, and Philip Stone, eds., The Analysis of Communication Content. New York: Wiley.
Lewis, R. Berry 1998 "ATLAS/ti and NUD-IST: A Comparative Review of Two Leading Qualitative Data Analysis Packages." Cultural Anthropology Methods 10 (3):41–47.
Martindale, Colin 1990 The Clockword Muse: The Predictability of Artistic Change. New York: Basic Books.
Namenwirth, J. Zvi, and Robert Philip Weber 1987 Dynamics of Culture. Winchester, Mass.: Allen and Unwin.
Neuman, Richard 1989 "Parallel Content Analysis: Old Paradigms and New Proposals." Public Communication and Behavior 2:205–289.
Roberts, Carl W. (ed.) 1997 Text Analysis for the SocialSciences: Methods for Drawing Statistical Inferences fromTexts and Transcripts. Mahwah, N.J.: Lawrence Erlbaum.
Stone, Philip J. 1997 "Thematic Text Analysis: New Agendas for Analyzing Text Content." In Carl W. Roberts, ed., Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts andTranscripts. Mahwah, N.J.: Lawrence Erlbaum.
PHILIP STONE
content analysis
), and has increasingly made use of linguistics and information science.
In its simplest form, content analysis consists of word counts (for example to create a concordance, establish profiles of topics, or indicate authorship style), but grammatical and semantic improvements have increasingly been sought. These include attempts to ‘lemmatize’, or count variants and inflections under a root word (such that ‘am’, ‘are’, ‘is’, ‘will’, ‘was’, ‘were’, and ‘been’ are seen as variants of ‘be’), and to ‘disambiguate’, or distinguish between different meanings of a word spelt the same (such as ‘a bit of a hole’, ‘a 16-bit machine’, ‘he bit it off’). More ambitiously, content analysis seeks to identify general semantic concepts (such as ‘achievement’ or ‘religion’), stylistic characteristics (including understatement or overstatement), and themes (for example ‘religion as a conservative force’), and this normally requires complex interaction of human knowledge and fast, efficient computing power, typified by a system such as the Harvard General Inquirer. Content analysis has concerns and techniques in common with artificial intelligence although it has to be able to cope with more general and open-ended materials. See also CODING.