Quasi-experimental data?

Stan Lieberson is one of a group of sociologists for whom I have great respect when it comes to intelligent thinking about social science methodology. His 1985 book, Making It Count: The Improvement of Social Research and Theory, is a good example of some of this thinking about the foundations of social science knowledge, and I also admire A Matter of Taste: How Names, Fashions, and Culture Change in the way it offers a genuinely novel topic and method of approach.

Lieberson urges us to consider "a different way of thinking about the rigorous study of society implied by the phrase 'science of society'" instead of simply assuming that social science should resemble natural science (3-4). His particular object of criticism in this book is the tendency of quantitative social scientists to use the logic of experiments to characterize the data they study.

An experiment is an attempt to measure the causal effects of one factor X on another factor Z by isolating a domain of phenomena -- holding constant all other causal factors -- and systematically varying one causal factor to observe the effect this factor has on an outcome of interest. The basic assumption is that an outcome is the joint effect of a set of (as yet unknown) causal conditions:

C1 & C2 & ... & Cn cause Z,

where we do not yet know the contents of the list Ci. We consider the hypothesis that Cm is one of the causes of Z. We design an experimental environment in which we are able to hold constant all the potentially relevant causal conditions we can think of (thereby holding fixed Ci), and we systematically vary the presence or absence of Cm and observe the state of the outcome Z. If Z varies appropriately with the presence or absence of Cm, we tentatively conclude that Cm is one of the causes of Z.

In cases where individual differences among samples or subjects may affect the outcome, or where the causal processes in question are probabilistic rather than deterministic, experimentation requires treating populations rather than individuals and assuring randomization of subjects across "treatment" and "no-treatment" groups. This involves selecting a number of subjects, randomly assigning them to controlled conditions in which all other potential causal factors are held constant, exposing one set of subjects to the treatment X while withholding the treatment from the other group, and measuring the outcome variable in the two groups. If there is a significant difference in the mean value of the outcome variable between the treatment group and the control group, then we can tentatively conclude that X causes Z and perhaps estimate the magnitude of the effect. Take tomato yields per square meter (Z) as affected by fertilizer X: plants in the control group are subjected to a standard set of growing conditions, while the treatment group receives these conditions plus the measured dose of X. We then measure the quantity produced by the two plots and estimate the effect of X. The key ideas here are causal powers, random assignment, control, and single-factor treatment.

However, Lieberson insists that most social data are not collected under experimental conditions. It is normally not possible to randomly assign individuals to groups and then observe the effects of interventions. Likewise, it is not possible to systematically control the factors that are present or absent for different groups of subjects. If we want to know whether "presence of hate speech on radio broadcasts" causes "situations of ethnic conflict" to progress to "situations of ethnic violence" -- we don't have the option of identifying a treatment group and a control group of current situations of ethnic conflict, and then examine whether the treatment with "hate speech on radio broadcasts" increases the incidence of ethnic violence in the treatment group relative to the control group. And it is fallacious to reason about non-experimental data using the assumptions developed for analysis of experiments. This fallacy involves making "assumptions that appear to be matters of convenience but in reality generate analyses that are completely off the mark" (6).

Suppose we want to investigate whether being a student athlete affects academic performance in college. In order to treat this topic experimentally we would need to select a random group of newly admitted students; randomly assign one group of individuals to athletic programs and the other group to a non-athletic regime; and measure the academic performance of each individual after a period of time. Let's say that GPA is the performance measure and that we find that the athlete group has a mean GPA of 3.1 while the non-athlete group has an average of 2.8. This would be an experimental confirmation of the hypothesis that "participation in athletics improves academic performance."

However, this thought experiment demonstrates the common problem about social data: it is not possible to perform this experiment. Rather, students decide for themselves whether they want to compete in athletics, and their individual characteristics will determine whether they will succeed. Instead, we have to work with the social realities that exist; and this means identifying a group of students who have chosen to participate in athletics; comparing them with a "comparable" group of students who have chosen not to participate in athletics; and measuring the academic performance of the two groups. But here we have to confront two crucial problems: selectivity and the logic of "controlling" for extraneous factors.

Selectivity comes in when we consider that the same factors that lead a college student to participate in athletics may also influence his/her academic performance; so measuring the difference between the two groups may only measure the effects of this selective difference between membership in the groups -- not the effect of the experience of participating in athletics on academic performance. In order to correct for selectivity, the researcher may attempt to control for potentially influential differences between the two groups; so he/she may attempt to control for family factors, socio-economic status, performance in secondary school, and a set of psycho-social variables. "Controlling" in this context means selecting sub-groups within the two populations that are statistically similar with respect to the variables to be controlled for. Group A and Group B have approximately the same distribution of family characteristics, parental income, and high school GPA; the individuals in the two groups are "substantially similar". We have "controlled" for these potentially relevant causal factors -- so any observed differences between academic performance across the two groups can be attributed to the treatment, "participation in athletics."

But Lieberson makes a critical point about this approach: there is commonly unmeasured selectivity within the control variables themselves -- crudely, students with the same family characteristics, parental income, and high school GPA who have selected athletics may nonetheless be different from those who have not selected athletics, in ways that influence academic performance. As Lieberson puts the point, "quasi-experimental research almost inevitably runs into a profound selectivity issue" (41).

There is lots more careful, rigorous analysis of social-science reasoning in the book. Lieberson crosses over between statistical methodology and philosophy of social science in a very useful way, and what is most fundamental is his insistence that we need to substantially rethink the assumptions we make in assigning causal influence on the basis of social variation.