Last week I wrote about reasoning models. I argued that — despite some recent flawed work on the subject — they have some curious limitations, and outlined a rough sense of where I expect developers to go in the future based on those shortcomings.
While I was researching the piece, I read lots of recent critical writing about AI. Some of it was good, but much of it read like extremely wishful thinking about what exactly systems can and cannot do.
On the plus side, the experience did help me formulate a simple heuristic to sift through writing about our subject. As soon as I see someone call a large language model a ‘bullshit generator’, I know to take whatever follows with a grain of salt.
Usually that person is an academic. It should go without saying that not all critics are academics and not all academics are critics. But it seems to be critical academics whose voices carry disproportionate weight in shaping public discourse. On a personal level, they're a group I see more regularly as an academic researcher since leaving industry.
The type of person I’m describing is occasionally a technical researcher, but more often they are a humanities scholar. Normally it’s a person whose work I respect, an otherwise clever thinker who seems to have caught the bug. It’s an unfortunate state of affairs for someone who counts themselves amongst their number.
‘Bullshit generator’ is a kind of shorthand, one that many academics use to signal to others that they have the right opinions about the AI project. One person says it and then another. And just like that it becomes orthodoxy. Everyone you know rolls it out whenever the opportunity arises, so why shouldn’t you?
Our meme is recycled so consistently because it feels just naughty enough. You can put the phrase in a paper or a newspaper headline and no one will tell you off. It has a forbidden fruit quality to it. Can you believe what we just said!
The sociology of the thing is curious, but it doesn’t tell us why the idea itself — that large models are useless paper tigers that don’t ‘know’ anything — is so attractive in the first place.
I suspect it’s because many of them dislike AI, so they don’t follow it closely. They don’t follow it closely so they still think that the criticisms of 2023 hold water. They don’t. And that’s regrettable because academics have important contributions to make.
Play the classics
I recently suggested that one reason for the animosity towards AI is that people feel like they’ve been duped. They had a vision of what AI ought to be in their head that doesn’t correspond to the technology in reality.
But strange as I think LLMs are, they are still useful. We’re talking about things that millions of people use every single day. Companies are openly saying job displacement is coming. Former US presidents agree.
But you wouldn’t think that was the case if you asked the average academic. They tend to scoff at the idea that anyone might use them for anything. You often hear them say things like:
‘It’s just linear algebra’
‘LLMs don’t know anything’
‘It’s all a PR exercise’
‘Stochastic parrot, stochastic parrot!’
‘Don’t they hallucinate everything?’
The most forceful of these is one is that trotted out reflexively: hallucinations. It’s all just made up, isn’t it? Don’t the models get most basic facts wrong?
Well no, not really. Certainly no more made up than some academic papers. You might have had a case back in 2023, but these days hallucinations are much rarer than you think.
On the Hugging Face hallucination leaderboard, the top four models score a factual accuracy rate of more than 99% on a document summarisation benchmark.
You might say that the test isn’t fair game because LLMs do more than summarise. And you would also be right to point out that some of OpenAI’s newer reasoning models seem to have bucked the trend based on the SimpleQA and PersonQA benchmarks.
But the rest of the stats tell a different story. On the Simple QA leaderboard, the best performing models — those that tend to supplement answers with an internet search functionality — clock in with between 90 and 95 per cent accuracy.
Fine. Even if they can regularly produce factually accurate information, they still don’t really know anything.
The problem with this line of thinking is that it requires a bit of philosophical wrangling, one that (for reasons unclear) the vast majority of academics seem unwilling to engage in. This is particularly frustrating because if you’re going to make forceful claims about epistemology, it seems rather unsporting to dodge the resulting debate.
When you think about these questions for more than five minutes, it’s pretty obvious that terms like ‘knowing’ or ‘understanding’ are slippery concepts. Never mind ‘truth’ or ‘information’. I don’t feel confident to say much other than AI definitely knows something.
Occasionally the claim gets tighter and it becomes something like LLMs can't generalise from a small amount of data, but performance on the ARC-AGI benchmark with just a handful of examples seems to prove that isn’t actually the case.
We also have thinking or reasoning. This one basically says the machines don’t think because that’s only something that humans can do. At best, all they can do is simulate thinking. This one I don’t mind so much as at least it tries to engage in a substantial argument that gets at the core of the thing.
It might be the language models can only simulate thinking or reasoning. Call me a utilitarian, but what matters to me most is how effective they are in the real world. Whether or not they are simulating thinking has no bearing on whether or not the machines are capable of rearranging the world for better or worse (though if you want to read about what I think is actually happening inside LLMs you can do that here).
And of course there’s the Foucaldian take: it’s all a PR exercise. Obviously, companies like to promote their product. AI is no different in that respect. But to argue that the richest firms in history are deploying trillions of dollars of capital in service of PR is a total non-starter.
You could say that they should be more sceptical of their inventions, but to propose that the entire apparatus of AI development — fighting off competition for chips, building enormous datasets for pretraining, and fine-tuning the model with the help of thousands of human reviewers — is for reputational purposes strikes me as a bit far-fetched.
How to criticise AI
To be clear, I am not down on academics. I am one! I only wish my colleagues would think more critically about their own beliefs, and accept that we simply don’t have enough information to understand where the ceiling is for the AI project as it exists today.
Below are some suggestions for what better AI criticism looks like (inspired by this excellent post) that reflects this uncertainty. It’s not exhaustive, but it gives a rough survey of useful elements for formulating critical commentary.
Things to do
Stay current: Base your claims on recent capabilities by staying up to date with AI research, model deployments, and real-world usage. When critiquing, use the best available models — not convenient strawmen (thankfully we are past the era of slide decks filled with GPT 3.5 gotchas).
Embrace humility: Accept uncertainty as a starting point and modify your approach accordingly. No one fully understands these systems yet (including the people building them). All things being equal, curiosity should precede criticism. In the words of Erling Haaland, stay humble!
Study adoption: Some struggle to believe anyone is actually using AI. But they are. Millions of them. If you want to analyse failure modes, you’ll have plenty go at by talking to the doctors, lawyers, and students who use the models. But you’ll also see that not every use-case is malicious (and that people are actually using LLMs).
Sample widely: When models work, seek to understand why and under what conditions. When they fail, collect multiple instances across different contexts. Ask the same question. A single amusing error tells us little; patterns of failure (and success) across varied conditions reveal the actual boundaries of capabilities.
Be creative: If LLMs don't fit neatly into existing epistemologies, maybe it’s time to make new ones. Rather than forcing these systems into old categories or dismissing them for not fitting, have some fun by developing new conceptual tools. Create the language and frameworks we need to understand AI.
Things to avoid
Reductive claims: Related to the above, saying ‘it’s just pattern matching’ explains nothing on its own. If you must make reductive claims, embed them in substantive arguments about what follows from that reduction. Ask whether your reduction captures what matters. Then explain why.
Forecasting with confidence: The history of AI is littered with assured proclamations about what machines will ‘never’ do. Current limitations are empirical facts worth documenting, but extrapolating them into fundamental barriers rarely ends well.
Treating AI as a monolith: Remind yourself that different architectures, training methods, and deployments yield vastly different capabilities. And note that systems are often composites. Understanding which component does what is crucial for meaningful critique.
Cherry-picking: Only citing failures while ignoring successes or dismissing benchmarks that contradict your thesis sounds more like advocacy than scholarship. Intellectual honesty means engaging with the full empirical record, especially the parts that surprise you.
Credentialism: Yes, peer review still matters. But dismissing research because it comes from industry labs or preprint servers rather than traditional journals is self-defeating. In a fast-moving field, the most important findings often emerge outside conventional channels.
Uncharted waters
Many moments in the history of thinking machines can be described by the maxim fake it until you make it. Too often what looked to be impressive performance was contingent on the man behind the curtain. That’s a thread that runs right through the invention of the difference engine to the emergence of parallel distributed processing in the 1980s.
But that isn’t happening today. Yes, today’s large models are complexes of data, human input, hardware, and clever algorithms. But they do actually work well for the most part, which is why millions of people use them every single day. In that sense our moment is unprecedented in the history of AI.
But right now, many academics who speak to policymakers or the press badly underestimate the capabilities of the best models. They dismiss LLMs out of hand and don’t engage with the substance of the technology.
Media narratives skew sensational or simplistic, and policymakers end up getting the wrong end of the stick. This is clearly bad if you want to make sure AI is integrated into society in the most socially beneficial way possible.
Accepting the reality of the situation is the best way to produce timely and relevant work. But that requires getting familiar with the technology so that the public debate is grounded in clarity.
AI is a social, cultural, and philosophical event. These are qualities that should make the technology the business of academics. Some are already doing great work, but more are needed to ask the questions the engineers don’t. What do humans do in a world with advanced AI? What kinds of collective failure modes exist when we all begin to use LLMs? And how should these systems be trained, evaluated, governed?
These are human problems, but too many scholars have absented themselves from the conversation. They think refusing to engage is a form of critique, when in fact it’s a form of abdication.
If they wanted to, academics could help define the terms of safe development. They could map the new epistemologies these systems generate, trace their impacts, and build the intellectual scaffolding we need to live alongside them.
But for that to happen they need to accept the AI project for what it is, not what they wish it to be.
Often the basis for AI criticism is a deep dislike of the tech-bro hyper-optimist view; the “AGI solves everything” idea. Academics tend to hate that mindset and so do I. AGI is poorly defined; it’s not here yet; and there’s no evidence that it solves more problems than it creates.
But this valid critique of overreach too often blinds smart people to realities. AI can do things, a lot of things actually, that were previously thought impossible for non-human entities. That is not tech-oligarch hype, it is just reality. And it does need open minds and (yes) new epistemologies. We do not build those by punching straw men. We need to think and build new descriptive ideas, to match AI as it emerges.
“90-95 percent accurate”? That means that a summary that contains 20 “facts” will contain one falsehood; of 20 references in an AI-generated paper, one will be made up.
A journalist with that record would be fired instantly upon discovery; an academic with that record would face a university inquiry and would never be taken seriously again.
And yet people are relying on AIs instead of the work of people who get things right for a living. This will not end well.