Using AI to Reveal Bad, Biased, and Bullshit Published Scholarship
Claude Slays a Vampire, An AI-Enhanced Wooden Stake Essay
This post is jointly authored by Rob Sica and me.1 Rob is a Humanities & Social Sciences Librarian at Colorado State. Given that social scientists cannot adequately vet scholarship in their own fields? Librarians to the rescue!
Summary Up Front
Wooden Stake Essays are ones in which I (or, in this case, Rob, Claude, and I — “who is Claude,” you ask? Read on) debunk claims in some published scholarship (killing the vampires that suck the blood out of social science). This is hard, time-consuming work.
AI to the rescue! It promises to streamline this process and maybe even, not too far down the road, put doing so on an industrial assembly line scale.
I posted the critique of microaggression scholarship shown below a few months ago (it was a draft of an article, which, slightly revised, has now been submitted for publication in a peer reviewed journal):
A review of microaggression research was recently published (screenshot below) making claims typical of those commonly made by microaggression advocates.
Into this mix stepped Rob Sica, who asked the AI, Claude, to critically evaluate the review article against the criticisms of microaggression research in my blog post/submitted manuscript.
Its answer was a douzy. Quoting Claude: “This article exhibits nearly all the methodological problems Jussim identifies.”

CONTEXT: The Microaggression Controversies
In 2017, the great psychologist, Scott Lilienfeld published a wickedly good critique of microaggression research:

In two articles, Monnica Williams (third author on the the main article in the summary above) claimed to have completely debunked Lilienfeld’s criticisms.

It would be useful to ask Claude to evaluate how well Williams addressed Lilienfeld’s criticisms, but I have not done that. For you, my ambious and motivated readers, this would be an excellent homework assignment — ask an AI (Claude, ChatGpt, your AI of choice is fine) to ingest Lilienfeld’s critique and Williams’ response. Then ask it how well Williams addressed Lilienfeld’s criticisms. This essay will lay out an example of how to do that. Feel free to post it in the comments (if you can keep it short) OR propose it as a full guest essay here.
Anyway, not only did Williams claim to address all of Lilienfeld’s criticisms, she called his criticism of microaggressions racist in another article.

In it, you will find this table.
I looked through several of Lilienfeld’s papers on microaggressions, and none had the word “unproven” and none had the phrase “not harmful.” However, its not presented as a quote and maybe he did write something that equates to that characterization. He certainly did skeptically challenge the rigor of the scientific claims about microaggressions, which stops a bit short of the characterization, raising the possibility that this is a case of Cathy Newmanism (from The Orwelexicon):
Having roundly criticized microaggression research myself, I expressed my bitter disappointment at Williams and company not denouncing my criticism as “advancing racism” as well here:
A Note on the Epistemic Perversion of Characterizing Scientific Criticism of Scholarship Self-Proclaimed to be “Anti-Racist” as Racism
Our (co-authored with the eminent Harvard clinical psychologist, Richard McNally) revised microaggression critique, the one actually submitted for publication, includes this:
Microaggression scholarship has a peculiar form of insulation against criticism. It often is framed as anti-racist. If one presumes that criticizing anti-racism is racist, then criticizing microaggression research is racist. This is epistemically2 perverse because it permits microaggression advocates to deflect any scientific criticism or skepticism as racism.
…
This rhetorical move – accusations and implications of racism for criticizing microaggression scholarship – replaces scientific defense with ad hominem reputational impugnment of the critics. It attempts to disqualify the organized skepticism on which scientific validity and credibility rests. There is an ugly history of shoddy “science” being produced when scientists are cowed into silence through repressive authorities or conformity induced by fear of public denunciations.
Ok, enough background. Let’s get to Claude’s analysis.
The Question Rob Sica Put to Claude (the AI)
In light of Jussim's critique of microaggression research, critically evaluate the attached article. Feel free to dig into the citations used in the article to support its claims. Here's Jussim's critique: https://unsafescience.substack.com/p/research-on-microaggressions-and
///Lee here. To be clear, the “attached article” refers to this, linked above///
CLAUDE’S RESPONSE
I’ll examine this article critically in light of Jussim’s concerns about microaggression research. Let me first review Jussim’s critique and then evaluate the paper you’ve provided.
Now I can provide a thorough critical evaluation of the Faber, Zare, and Williams article through the lens of Jussim’s critique [Lee’s note: This is really my critique co-written with McNally]. This is a substantive analysis, so let me work through it systematically.
Critical Evaluation: Faber, Zare & Williams (2026) vs. Jussim’s Critique
Core Methodological Issues
1. Perception vs. Reality Problem
The article falls directly into Jussim’s primary criticism. Throughout the paper, the authors conflate perceptions of microaggressions with actual microaggressions. For example:
The abstract states microaggressions “can erode psychological well-being” (treating them as objective phenomena)
Page 5 defines microaggressions as “subtle, everyday expressions of racism” but then exclusively cites research measuring self-reported experiences
Table 1’s “Key Findings” column describes effects of “racial microaggressions” when the cited studies only measured perceptions
None of the studies cited in this review actually measured microaggressive behaviors of perpetrators. They all rely on target self-reports, yet the authors write as if the phenomenon itself has been established.
2. The Causation Problem
The article makes repeated causal claims based entirely on correlational evidence:
Abstract: “these subtle forms of racial harm...can erode psychological well-being” (causal claim)
Page 6: “constant chronic exposure leads to stress and eventually trauma” (causal chain)
Table 1: Lists “implications” as if causation were established
Section header (p.7): “Racial Microaggressions” as a theme with causal “implications for relationships”
The authors use what Jussim calls “persuasive communication devices” - words like “impact,” “erode,” “undermine,” and “compound” that smuggle in causation. On page 6, they explicitly acknowledge using the language of “impact” while only having correlational data, yet continue using causal language throughout.
3. Quality of Cited Evidence
Let me examine some key citations:
Study [1] - Gonlin & Belser (2025) - Marked as “special interest”
This is interview research with 82 Black women about their perceptions of their white male partners’ responses to racism
It cannot establish: (a) whether microaggressions occurred, (b) whether partners’ responses actually matched the women’s perceptions, or (c) causation
Yet it’s used to support claims about how microaggressions “erode” relationships
Study [3] - Yampolsky et al. (2023) - Marked as “outstanding interest”
Uses self-report measures of “intimate racism”
The authors acknowledge it examines “how intimate racism affects young intercultural couples” - but this is based on target perceptions, not demonstrated behaviors
No validation of whether reported “racist attitudes or microaggressions” actually occurred
Study [11] - Williams (2021): “Microaggressions are a form of aggression”
This is Monnica Williams (the senior author of the current paper) making a definitional argument
Jussim specifically critiques Williams’s work for conflating perceptions with reality
Using one’s own theoretical papers to establish empirical facts is circular
4. The “Experts” Problem
The article repeatedly relies on authority rather than empirical demonstration:
Definitions of microaggressions come from theoretical papers, not empirical validation
The framework treats microaggressions as established phenomena requiring only better understanding of their effects
No engagement with critiques (like Lilienfeld 2017, which Jussim discusses)
The senior author (Williams) is one of the field’s main advocates, creating potential for confirmatory bias
Specific Problematic Claims
On Frequency (Page 5): The article claims microaggressions are “pervasive” and involve “constant chronic exposure.” But Jussim notes that Nadal’s (2011) own data showed people reported experiencing supposed microaggressions only about 3 times in 6 months - hardly “constant” or “chronic.”
On Causation (Throughout):
“When microaggressions are dismissed or minimized by others...the resulting psychological harm, akin to gaslighting, undermines trust” (p.6)
This assumes: (1) microaggressions occurred, (2) dismissal caused harm, (3) alternative explanations don’t apply
Alternative explanations never considered:
Could neuroticism or negative attributional style cause both perception of microaggressions AND relationship distress?
Could relationship distress cause heightened sensitivity to ambiguous behaviors?
Could reporting bias explain correlations (people in distress may interpret past events more negatively)?
On Measurement (Table 1): Studies listed under “Racial Microaggressions” theme cite research using scales like Nadal (2011) that Jussim specifically critiques for:
Not validating whether reported experiences correspond to actual behaviors
Not testing whether behaviors stem from racism
Finding low frequencies that contradict “pervasive” claims
The Circular Logic Problem
The article exemplifies what Jussim calls microaggression scholarship’s “peculiar form of insulation against refutation”:
Microaggressions are defined as subtle racist acts (p.5)
Any questioning of whether specific behaviors are microaggressions becomes “invalidation”
Invalidation is itself characterized as harmful and a form of “racial gaslighting” (p.6)
Therefore, skepticism about microaggressions confirms their existence and harmfulness
This makes the framework unfalsifiable - a key warning sign in science.
What Would Better Evidence Look Like?
Following Jussim’s logic, stronger evidence would require:
Behavioral validation: Studies correlating target reports with independent observations of perpetrator behavior
Motivation assessment: Testing whether behaviors labeled as microaggressions actually stem from racism vs. other causes
Experimental evidence: Manipulating actual microaggressive behaviors (not just perceptions via vignettes) and measuring effects
Causal modeling: Articulating specific causal mechanisms, testing them against alternatives, and using methods beyond simple correlation
Representative samples: Moving beyond convenience samples of college students
None of these exist in the literature reviewed by this paper.
The Clinical Implications Problem
The conclusion offers advice for “White partners” to provide “racially attuned support” (p.9), including:
“Recognize when/if your partner is actively learning” about racism
Develop “racial literacy, accountability, and the willingness to see clearly”
But if the empirical foundation is weak, these recommendations risk:
Training people to interpret ambiguous behaviors as racist (potentially harmful)
Creating a framework where the white partner is always presumed wrong
Inducing what Jussim calls “a cognitive mindset that sees hostility in ambiguous interactions”
This is exactly what Jussim warns against: premature real-world application of unvalidated constructs.
Positive Aspects
To be fair, the article does have some strengths:
It acknowledges racial trauma is real (distinct from whether microaggressions cause it)
It recognizes the complexity of interracial relationships
The emphasis on empathy and perspective-taking could be valuable regardless of microaggression validity
Table 1 properly labels findings as “implications” rather than proven effects (though the text doesn’t maintain this distinction)
Bottom Line
This article exhibits nearly all the methodological problems Jussim identifies:
✗ Treats subjective perceptions as objective phenomena
✗ Makes causal claims from correlational data
✗ Doesn’t consider alternative explanations
✗ Relies on unvalidated “expert” definitions rather than empirical demonstration
✗ Uses the framework’s assumptions to validate itself
✗ Recommends real-world applications despite weak evidence base
✗ Frames criticism as part of the problem being studied
After 50+ years of research, the field still hasn’t established that the behaviors they label as “microaggressions” (a) actually occur as described, (b) stem from racism, or (c) cause the harms attributed to them. This review paper, published in January 2026, shows the field hasn’t addressed Jussim’s fundamental critiques.
The tragedy is that real racism exists and causes real harm. But studying it poorly doesn’t help address it - it potentially makes things worse by promoting a framework that may increase conflict, reduce interpersonal generosity, and pathologize ambiguous interactions.
END CLAUDE
The March of the AI’s
Lee here. AI’s sometimes get stuff wrong. I welcome verification. Claude did an excellent job of summarizing our critique of microaggression research but you can read it yourself (its linked in the beginning of this essay) and reach your own conclusions. I’ve also read the Faber et al article, and I believe Claude’s description and critique is a dead-on bullseye.
But here’s the thing. Even if you include Rob’s accessing Claude, and posting the papers for it to ingest, and thinking about and then posing the question, and Claude’s processing time — the whole thing took about 10 minutes. It would have taken me most of two days.
Rob also put a recent major review of “implicit bias” through the tests for “propaganda scholarship” Nate Honeycutt & I described here;.
The picture was not pretty; this probably will be the basis for an upcoming post. However, this got me worried — does Claude always conclude papers are bad for the reasons I have laid out elsewhere? I mean, even I do not think all papers are bad or biased political propaganda. But no, Claude does not evaluate all papers as meeting my standards for bad scholarship. Rob put a paper to the same test that did not have a whiff of propaganda or political bias and Claude basically said, “This paper is clean as a whistle!”
I have been doing this sort of thing — critically evaluating scholarship for political biases,propaganda and a variety of other unjustified or misleading claims — for years now and, in addition to posts at Unsafe Science (such as here, here, and here, but there are many more), managing to publish them as academic papers, which you can find at my Rutgers website.
But the thing is, that sort of debunking is hard. It requires articulating a priori standards for bullshit detection (such as here or here). It requires reading the paper-that-might-deserve-debunking very slowly and carefully. It requires checking their citations — i.e., carefully reading those papers — to determine whether they actually support the claim they are cited to support. It requires demonstrating and articulating just how the paper-in-question has gone wrong. This is painstaking, time-consuming work. Might take me two full days or more, to do all that reading and then write it up, even for a blog post.

The slow, painstaking nature of such debunking has so far meant that it was impossible to actually demonstrate the general extent of propaganda masquerading as scholarship or otherwise politically biased academic papers. It was just not possible to subject vast literatures to this sort of analysis. It was very difficult, though not always impossible (the citation analyses here reveal a pervasive bias in the area of sex bias in academic hiring), to produce much more than anecdotal evidence of such biases.
But having an AI, such as Claude, do this could be a gamechanger if we find: 1. It works consistently; 2. It continues to show adequate discernment, distinguishing between, e.g., publications that show ample evidence of being propaganda, versus those that do not (I am in discussions with several people about scaling this up and writing it up for possible publication)
The results reported here are proof of concept, so its just the beginning, albeit a useful and productive one.
Now, imagine doing this with scores, or even hundreds, of the most influential papers published in the most prestigious academic journals.
Epilogue and 1/28/26 Update: We Must be Doing Something Right
I posted this essay on X a couple of days ago. Most of the response was quite enthusiastic, but I also received this:
“Who is Dylan Riley?” I asked myself. And, duh, I should have guessed. His work is just the type of propaganda masquerading as “sociological” scholarship that has gotten the academy in so much trouble, not just with Republicans, but with other academics and many other Americans (he is a sociologist at … can you guess? … wait for it … Berkeley!). This is exactly the type of thing some of us have been warning about for decades:
And, duh! Academic promoting propaganda masquerading as scholarship objects to … methods for revealing propaganda masquerading as scholarship! See footnote 3 for two examples of his own bullshit propaganda scholarship.3 This is exactly the type of academic I hope to be denounced, insulted and ridiculed by. Strongly indicates we got it right. Kinda reminds me of this:
From that essay:
I have had many accomplishments, awards and honors over the years. I have chaired three different programs at Rutgers, founded a professional society, and advised grad students who have gone on to superb careers inside and outside of academia, and undergraduates who themselves have received awards for their research.
But perhaps the thing I am most proud of, my finest hour, happened about two weeks ago. My (and my collaborators’) research was denounced by the Communist Party of China (hence, CCP) last week.
Commenting
Before commenting, please review my commenting guidelines. They will prevent your comments from being deleted. Here are the core ideas:
Don’t attack or insult the author or other commenters.
Stay relevant to the post.
Keep it short.
Do not dominate a comment thread.
Do not mindread, its a loser’s game.
Don’t tell me how to run Unsafe Science or what to post. (Guest essays are welcome and inquiries about doing one should be submitted by email).
Footnotes
Epistemic. This is one of those GRE words that normal people almost never use. It basically means “how we go about figuring stuff out.” Basic statistical/methodological epistemology says stuff like “don’t assume a particular causal direction between two variables based on a mere correlation.” Basic critical theory epistemology says “lived experience” is more informative than scientific studies on the same topic. If you think I am exaggerating, I have another article by microaggression advocates for you:
Riley’s propaganda scholarship. I looked up some of Professor Riley’s “scholarship.” You can find his vita here. From this paper:
Bourdieu’s sociology, however, offers something more than a generalization of the “professorial” experience. It also offers an identity, one with certain parallels to what Lenin called the “professional revolutionary.” Bourdieusian sociologists are a vanguard. They possess insights into the workings of the social world that derive from their social theory but are denied to the laity mired in the swamp of common sense and everyday understandings.
Heh. Well that does explain a lot…
On the topic of propaganda scholarship, Riley is a co-author on this paper:
In it, you will find this statement:
Multiple studies have systematically documented bias in every aspect of academia … including … hiring decisions (Nielsen, 2016) …
So, this means Nielsen (2016) found biases against women in hiring, right? I mean, that’s what Riley (et al) are claiming the paper found right? I tracked down Nielsen (2016) and Nielsen’s rhetoric is indeed, all about biases against women. But let’s look at Nielsen’s actual findings. Patterns of hiring by sex are reported in two tables. Table 4 is for all full and associate professor hires.
Table 4, above, is all full and associate professor hires, including both temporary and permanent ones (the study was conducted in Denmark, and, as far as I can tell, a “temporary” position is kinda like the U.S.’s untenured position, and “permanent” is kinda like our tenured positions. So Nielsen reports the results for “permanent” positions (which I glean as only full professors) separately in Table 5:
Nielsen reported no statistical tests for sex bias, so I performed 3 chi-square tests for 2 x 2 contingency tables (1. Male v. Female by Applicants v. Hired for all Full professors, Table 4; 2. Male v. Female by Applicants v. Hired for all Associate professors, Table 4; 3. Male v. Female by Applicants v. Hired for all Full professors, Table 5. Tests 1 and 3 were nonsignificant, meaning there was no evidence of bias. Test 2 was significant (p<.01), but look at the table numbers! It means there was bias … against men! So mostly Nielsen had no evidence of hiring discrimination and the evidence he had, favored women.
Here is Dylan Riley’s statement again:
Multiple studies have systematically documented bias in every aspect of academia … including … hiring decisions (Nielsen, 2016) …
Now, it does not say “bias against women” but this is obvious from context. And in case you have any doubt, here is a section header summing up later in the paper:
Academic hiring, tenure decisions, and promotions favor men
They don’t of course. This conclusion comes from a very selective citation of studies that do show biases favoring men and completely ignoring those that show biases against women, the full list can be found here:
OF COURSE, as a master practitioner of propaganda scholarship, Dylan Riley will denounce any method capable of exposing propaganda masquerading as scholarship. It tells you we are on the right track.























I am a bit skeptical about using AI for conceptual stuff. First of all, We can probably get the AI to do what we want by changing the prompt. Have to you tried "How well is the target paper dealing with Jussim's criticisms?" or anything that suggests that the paper would actually do it well? And also, LLMs are likely to help ideological captures of disciplines, because if everyone writes "Jussim's criticisms are unjustified" (I am devil's advocate here, I do NOT think that), this is what the LLMs will parrot.
Very interesting, thank you for doing this! I wonder what Claude's default critique of the racial micro-aggressions paper would have looked like sans Lee's critique in advance. As coherence engines, LLM's are quite charitable to the line of thinking you want them to think on. I have to game the AIs, pitting Claude and ChatGPT against each other, back and forth, and use other foils in order to get a fair critique of my thinking. BTW: should you scale the vision up, I have been working on a metascience engine that quantifies, visualizes, and communicates the strength of any claim. There might be overlap. Thanks again for the good work!