The myth of the need for diversity among subjects in theory-testing research
Wolfgang Stroebe
What appears below is the article by Wolfgang Stroebe that was denounced by almost 1400 academics as racist, and contributed to the Perspectives on Psych Science (PoPS) Debacle (go here, here, or here for more info on this). It is one of the articles critical of Roberts et al (2020) that was accepted by former editor of PoPS, Klaus Fiedler, and which contributed to his defenestration at the hands of authorities who caved to the demands of an academic outrage mob. It is printed here with Professor Stroebe’s permission.
Wolfgang Stroebe
Department of Social and Organizational Psychology
University of Groningen
Grote Kruisstraat 2/1
9712TS Groningen
The Netherlands
Abstract
Roberts and colleagues focus on two aspects of racial inequality in psychological research, namely an alleged underrepresentation of racial minorities and the effects attributed to this state of affairs. My comment focuses only on one aspect, namely the assumed consequences of the lack of diversity in subject populations. Representativeness of samples is essential in survey research or applied research that examines whether a particular intervention will work for a particular population. Representativeness or diversity is not necessary in theory testing research, where we attempt to establish laws of causality. Because theories typically apply to all of humanity, all members of humanity (even American undergraduates) are suitable for assessing the validity of theoretical hypotheses. Admittedly, the assumption that a theory applies to all of humanity is also a hypothesis that can be tested. However, to test it, we need theoretical hypotheses about specific moderating variables. Supporting a theory with a racially diverse sample does not make conclusions more valid than support from a non-diverse sample. In fact, cause-effect conclusions based on a diverse sample might not be valid for any member of that sample.
In their analysis of “racial inequality in psychological research”, Roberts and colleagues focus on two aspects, namely an alleged underrepresentation of racial minorities in all levels of psychological research and the effects they attribute to this underrepresentation. Hommel (2022) has persuasively argued that the perception of an underrepresentation might be the result of a base-rate fallacy. Therefore, I will focus on the effects such an underrepresentation might have on psychological research. Because space restrictions prevent me from commenting on all these alleged effects, I will focus on the section, in which they deplore the lack of diversity in participants in psychological research, which supposedly makes this research unrepresentative.
The lack of representativeness in psychological subject populations is a complaint that has frequently been aired in psychological publications. Already in 1946, McNemar complained, “the existing science of human behavior is largely the science of the behavior of sophomores” (p. 333).[1] More recently, Arnett (2008) argued, “research on the whole of humanity is necessary for creating a science that truly represents the whole of humanity” (p. 603). Because research on all of humanity is a tall order, Henrich et al. (2010) tried to suggest more realistic options such that universities should create “non-student subject pools – for example, by setting up permanent psychological and behavioral testing facilities in bus terminals, Fijian villages, rail stations, airports and anywhere diverse where subjects might find themselves with extra time” (p. 82). Compared to these proposals, the demand to increase the number of minority subjects in psychological research that highlights race seems extremely reasonable.
Before social psychologists begin to worry how they could guarantee that an experimental manipulation developed to operationalize a theoretical construct for American undergraduates would reflect the same construct for inhabitants of Fijian villages, or for business executives rushing to catch their plane or train, I can reassure readers that representativeness of subject populations is unnecessary in theory testing research. This argument has frequently been advanced before (e.g., Calder et al., 1982; Mook, 1983; Stroebe & Nijstad, 2009; Stroebe et al., 2018). It is probably due to the popularity of inductivist notion of “external validity” (Campbell & Stanley, 1966) that these arguments have never gained wider acceptance among psychologists even though it is quite easy to demonstrate their correctness.[2]
Most psychological theories apply to all of humanity. Although they do not explicitly state this, the fact that they do not specify a particular subpopulation to which they apply, implies that they claim implicitly to apply to the totality of mankind. Thus, any subsample of members of mankind (e.g., American undergraduates) would be an appropriate subject for testing such theories. If a theoretical hypothesis is not supported in an experimental study with a group of undergraduates, then the theory has to be rejected. If the hypothesis is supported then the theory has been supported.
This does not mean, however, that such confirmation proves a theory to be true (Popper, 1959).[3] A theory can never be proven to be true, because it is impossible to rule out all alternative explanations for a given finding. In testing a theory, a researcher has to translate theoretical variables into manipulations and into measures of the effect of these manipulations. There is always the possibility that these “auxiliary hypotheses” (Gadenne, 1984; Trafimow, 2012), which link abstract and unobservable theoretical concepts to empirical manipulations, could be wrong. Alternatively, the experimenter might not have been successful in eliminating potential third variables that might have been responsible for the effect. But the more strong empirical tests a theory has successfully undergone, the more it can be considered as well supported.
The assumption that a theory applies to all of humanity is also only a hypothesis that can be proven wrong. However, conducting research in bus terminals is not a suitable procedure to address this problem. If a theory is supported by experiments in two bus terminals, we cannot be certain that it might have been rejected by a study conducted in an airport (or Fijian village). Similarly, if a theory is supported in a bus terminal but not an airport, we do not know the reason for this discrepancy. To be informative, any test of the assumption that a theory applies only to specific subsections of humanity has to be guided by theory.
Let me use a classic study by Hovland et al. (1949) to illustrate this point. They tested whether one-sided or two-sided communications were more persuasive. They used army soldiers as subjects and found no difference: Both types of communications appeared to be equally persuasive. However, when they divided their participants into subgroups according to their level of education, they found two-sided communications more effective with the more highly educated participants and one-sided communications to be more effective with individuals with lower levels education. Thus, if they had conducted their research with samples of undergraduates, they would have concluded that two-sided communications were most effective and if they had done the study with factory workers, they might have found one-sided communications to be most effective. But most importantly, if they had conducted their study with a random sample of humanity – should such a feat be possible – they would have found no difference. And this last conclusion would have been invalid for most members of their sample. Thus, “it is a misconception to assume that research on the whole of humanity would create a science that truly represents the whole of humanity. If no moderation is expected, any subgroup of the population will do equally well, even the often maligned undergraduate students” (Stroebe & Nijstad, p. 596). Similarly, if race moderates effects, the findings of a study conducted with a racially diverse sample might result in conclusions that do not apply to any of the racial subgroups in that sample.
Therefore, a psychological science that acknowledges potential racial differences in psychological processes has to start out with theories about such differences. To be psychologically meaningful, such differences have to be linked to measurable psychological constructs. Merely demonstrating that minority subjects respond differently from white participants is not psychologically meaningful or interpretable, unless that difference can be attributed to some psychological construct (e.g., attitude, personality trait). However, once such a construct has been identified, it is likely that the racial difference in responses is due to the fact that this particular construct is more or less frequent among different racial groups. Thus, I would not expect that any of the recommendations made by Roberts and colleagues (2020) (e.g., “establish a diversity task force”; “release public diversity reports annually”) is likely to result in a psychological science that reflects racial diversity.
Representativeness of samples, although not important for theory testing research, is important for many forms of applied research. For example, when researchers aim to determine the percentage of a population that has a particular characteristic (e.g., votes Republican; holds a particular attitude), representativeness of samples is important. In this research, one does not attempt to establish laws of causality, but wants to infer from a small sample how certain features are distributed in a large population. Similarly, if one wants to assess the effectiveness of a planned mass media campaign to persuade Fijian villagers to eat more vegetables, one must pretest that intervention with respondents, who are representative for that population (i.e., Fijian villagers rather than American undergraduates). But this limitation does not apply to theory testing, which after all, is the main focus of research conducted in psychology departments.
References
Arnett, J. J. (2016). The neglected 95%: why American psychology needs to become less American. American Psychologist, 63, 602-614.
Calder, B. J., Phillips, L. W., & Tybout, A. M. (1982). The concept of external validity. Journal of Consumer Research, 9(3), 240-244.
Campbell, D. T., & Stanley, J. G. (1996). Experimental and Quasi-experimental. Design for Research. Chicago, IL: Rand McNally.
Christie, R. (1965). Some implications of research trends in social psychology. In O. Klineberg & R. Christie (Eds.), Perspectives in social psychology (pp. 141-152). New York: Holt, Rinehart & Winston.
Gadenne, V. (1976). Die Gültigkeit psychologischer Untersuchungen. (The validity of psychological Research). Stuttgart, Germany: Kohlhammer.
Hommel, B. (2022). Dealing with diversity in Psychological science or ideology? Perspectives on Psychological Science,
Hovland, C.I., Lumsdaine, A & Sheffield, F.D. (1949). Experiments on mass communication. Princeton, N.J.: Princeton University Press
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2-3), 61-83.
Mook, D. G. (1983). In defense of external invalidity. American Psychologist, 38(4), 379-387.
McNemar, Q. (1946). Opinion-attitude methodology. Psychological Bulletin, 43(4), 289 – 374.
Popper, K. (1959). The logic of scientific discovery. London, UK: Routledge.
Roberts, S. O., Bareket-Shavit, C., Dollins, F. A., Goldie, P. D., & Mortenson, E. (2020). Racial inequality in psychological research: Trends of the past and recommendations for the future. Perspectives on Psychological Science, 15(6), 1295-1309.
Trafimow, D. (2012). The role of auxiliary assumptions for the validity of manipulations and measures. Theory & Psychology, 22(4), 486-498.
Stroebe, W., & Strack, F. (2014). The alleged crisis and the illusion of exact replication. Perspectives on Psychological Science, 9(1), 59-71.
Stroebe, W., Gadenne, V., & Nijstad, B. A. (2018). Do our psychological laws apply only to college students? External validity revisited. Basic and Applied Social Psychology, 40(6), 384-395.
Stroebe, W., & Nijstad, B. (2009). Do our psychological laws apply only to Americans? American Psychologist, 64, 569.
[1] His complaint was either premature or prescient, because in 1949 the proportion of studies based on undergraduate participants in JASP was only 20% (Christie, 1965).
[2] The concept of external validity is inductivist, because it asks “To what populations, settings, treatment variables, and measurement variables can this effect be generalized?” (Campbell & Stanley, 1966, p. 5). In theory-testing research, we do not generalize from effects. The theory specifies the class of people to whom it applies.
[3] As I have argued elsewhere, the same applies to rejections (e.g., Stroebe & Strack, 2014)
The slight weakness of this argument is that if psychologists keep going back to the *same* subset of humanity for multiple studies (which they do) then they do build up a complex and interrelated theory of human psychology which consistently assumes everyone behaves like second-year psych students. After the third, fourth or fifth study on undergrads produces the same or compatible results, researchers do treat it as settled science about everybody, which they wouldn't if equal proportions of research were done on undergrads/ at bustops / in Fijian villages
However, continually testing the same parameters of diversity (white undergrads vs black undergrads) probably doesn't add much
I think the underlying premise of this piece is that opposition wants to adhere to reciprocity and logical arguments.
Look how little effort they have holding two opposite positions: No differences between races/sexes implies that calling for diversity is irrational.
Psychology and many other areas have been thoroughly feminised and I fail to see how logic can be used to oppose toxic empathy.