Learning From Examples

Agree with all of this. My view is basically that recognising that AI is more sophisticated than many tend to assume should open up more lines of (important) critical work, not close them off!

Expand full comment

Sean Trott

Jul 15

Yeah I think that’s exactly right—the most interesting philosophical and empirical work arguably can only start once we adopt some humility about what we think these systems can and can’t do; and also about the difficulty in properly assessing what we want to assess.

Expand full comment

naveen

And I also think there's a bit of bitterness that these tech bros and AI labs didn't actually understand or solve the deep mysteries of linguistics and cognitive science -instead, they simply made a lot of GPUs go brrrr, and these are GPUs that academics don't have.

Expand full comment

Reply (3)

Bill Taylor

Some agreement here; although I think it’s not just that academia does not have the GPUs; but also they don’t WANT the GPUs…. Because GPUs symbolize emergent outcomes produced in disorganized bundles, that don’t match well to linguistic theory or terminology.

Expand full comment

Ken Kovar

Sep 8

Well they sure do have them but it would not necessarily be a priority for people not in CS or ML research!

Expand full comment

Ken Kovar

This is a problem that still needs to be addressed. It needs to be a tool that helps with linguistics and cognitive science . I think there will be more effort made to connect these fields with AI research.

Expand full comment

Keller Scholl

Sep 8

Why does it need to be this? There seems to be this deep conviction among many linguists that what they do must be important for AI. I don't get why. Maybe their life's work turns out to be unimportant for understanding thought and intelligence, and linguists spent their time constructing grammars for tiny languages to no further benefit. I get why, emotionally, that's devastating. But I don't see why others should take them seriously.

Expand full comment

Ken Kovar

AGI is a long term goal that right now has a lot more hype to it. People are fine being skeptical about it both in academia and general society. But LLMs and related systems like chatGPT and Claude are already very useful in programming, research and brainstorming. Everyone needs to try using it or at least not have superficial criticisms of it, especially if you are in a teaching or creative position !

Expand full comment

stereomono

Jun 24

осталось найти стоящее определение человека современного и начать размышлять про то что такое программа которую разрабатывает человек

Expand full comment

“90-95 percent accurate”? That means that a summary that contains 20 “facts” will contain one falsehood; of 20 references in an AI-generated paper, one will be made up.

A journalist with that record would be fired instantly upon discovery; an academic with that record would face a university inquiry and would never be taken seriously again.

And yet people are relying on AIs instead of the work of people who get things right for a living. This will not end well.

Expand full comment

Jun 17Edited

Journalists have sub-editors and editors to check for errors. Academics have peer review and journal edits. Discounting that, there's not much in it in terms of accuracy between a human-written first draft and a draft from the best models.

Neither people nor AI are infallible. Obviously, you should check the outputs before using the models - and you should generally avoid 'relying' on AI too.

Expand full comment

Editors (outside The New Yorker, and probably not even them nowadays) do not check journalist’s facts: they rely on the journalist’s integrity and skill, which is why journalists who make things up get fired right away.

Peer reviewers are unpaid and couldn’t possibly check every fact and citation.

And if one can’t “rely” on AI summaries, what exactly should one do with them? Treat them as entertainment?

Expand full comment

Reply (3)

Jun 17Edited

As someone who had a piece in Time about two weeks ago, I can assure you they absolutely do check facts!

Expand full comment

Ok, a few elite magazines still do. Most don’t. Newspapers never have.

A huge proportion of what we regard as information can only be trusted as such because of human professionalism and judgment. AI adoption will lead —has already led—to all sorts of things being accepted as true that simply aren’t.

Expand full comment

CMar

I use AI every day and it is immensely useful. It gets things wrong far less often than a human assistant would. Stop living in a binary world where they are either 10,000% reliable or they are useless. You come across as a whiny prick, no offense. Just like when you work with a human employee, they can get things wrong, and you need to keep an eye out for that. It’s not a big deal.

Expand full comment

Lisa

Jun 20

It really is a big deal when you start using it for things where accuracy matters.

Courts, money, medical care are not places where inaccuracy is tolerated.

Expand full comment

*journalists’

Expand full comment

Lisa

Jun 20

The difference is literally that a human making up citations would be summarily fired the first time.

This is a huge and ongoing obstacle to usefulness in many applications. And it happens to organizations that should know better. See for example https://techcrunch.com/2025/05/15/anthropics-lawyer-was-forced-to-apologize-after-claude-hallucinated-a-legal-citation/

Expand full comment

Randall

Jun 22

Humans also get vetted, instead of just a cost analysis.

Expand full comment

James Hill

Jun 18Edited

That’s a shockingly terrible accuracy rate. It only sounds good out of context.

For comparison, I work for publishing company and our correction rate is on average less than one per journalist per year, for people who produce a substantial piece of writing (at least 400-500 words) every day, and usually more.

Each piece of writing usually contains dozens of facts and quotes. Even if you account for prevented corrections (which were caught by an editor) that adds maybe a couple per month.

You are basically saying that LLMs make about 10-50x the number of mistakes of a professional writer.

Furthermore, most of the corrections we have are fundamentally different from an LLM hallucination. Usually it’s things like an incorrect date, a transposed number, or other boneheaded mistakes which are easily fixed and don’t expose the publisher to much liability. LLMs can fabricate entire quotes and attribute them to companies! Do you have any idea how damaging that can be to a publisher and to the company being ‘quoted’?

Not even getting into the fact that LLMs can go to the other extreme and regurgitate something produced by someone else, effectively opening the company up to a plagiarism lawsuit.

Expand full comment

Erik

Jun 20Edited

While I found your comment really insightful, I think it’s useful to look at the human baseline performance of the simpleQA benchmark: “We found that the [human baseline] matched the original agreed answers 94.4% of the time, with a 5.6% disagreement rate.”

Further, they state that the error rate of the test may be as high as 3%.

I think it would be really interesting if your company would answer the questions in that test, or even better created its own benchmark based on your experience over the years - that would give us a much more realistic comparison to work with!

Expand full comment

Randall

Jun 22

If you're going to do that, you should include a baseline for AI models, also. Include the poorly developed, or erratic and unknown LLMs that don't make wapo headlines. Why is it equivalent to compare a program that required hundreds of billions to develop, to someone who may have had very little education, no interest in learning, or no interest in participating? Let's at least try to compare apples to pears.

Expand full comment

I mean it is quite a tall order to ask academics to adapt their epistemologies and accept the AI project for what it is given the complete lack of self criticism in AI and the abundance of fantastical prophecies of unlimited and unstoppable progress- just check out what the fields 'godfather' G. Hinton or D.Amodei have been blessing humanity with lately. I wonder why there is not a single other scientific field that requires one to rethink their epistemology? And it really requires a special kind of self-delusion to claim that there are has been no substantive scientific critique of AI, on the contrary the critique is pretty much as old as the field. The problem is that ignoring such critique has virtually become part of AI researchers job description nowadays.

Expand full comment

'And it really requires a special kind of self-delusion to claim that there are has been no substantive scientific critique of AI' - absolutely, and not something that this post claims!

Expand full comment

Well then why strawman perfectly respectable arguments? For instance take the case of hallucinations, as you mention them extensively in the post, which are a natural and inevitable consequence of the fact the LLMs are stochastic generative models and the limited coverage of their training data. No serious academic claims that LLMs hallucinate all the time, the argument is that hallucinations are inevitable when the LLM is presented with contexts outside it's training data due to the nature of the system.

Expand full comment

In no way is 170 words 'extensive'. As for your claim, yes obviously hallucinations are a natural function of the nature of LLMs. That tells us nothing. The point is a) that they are not as common as critics assume them to be, and b) they are becoming more infrequent over time.

Expand full comment

What does it even mean that they do not hallucinate frequently? - it all depends on what you are asking them if you asked them some mundane question they will do just fine, but if you ask them anything beyond some threshold hallucinating is all they do.

Expand full comment

Jake

Also consider that increasingly AI systems are using real world grounding. Rather than just relying on training data they can search the web. Or for example, when generating code they can then run that code. If the code doesn’t run due to a hallucination- then it will fix itself. As a practical matter it doesn’t really matter (for coding) at this point other than to increase duration and/or cost.

Expand full comment

CMar

Lmao, you are literally like the critics he mentions in his article that act like they haven’t used an LLM since 2023. Dude, wtf are you talking about? If you ask them about something not in their training data, they will most likely tell you that they don’t know the answer to that question. This problem was solved years ago (eons on AI-time).

Expand full comment

Jun 18Edited

So hallucinations in LLMs have essentially been solved years ago? - I guess the research community has somehow failed to notice.

Expand full comment

It's an amazing statement that no other scientific field has required (at least some) people to adapt their epistemologies. Biology (evolution). Atheism was only (intellectually) possible after massive advancements in physics to provide non-divine explanations of the world, and that has massive epistemological implications. Dualism was much more attractive before we demonstrated the ability to alter thoughts via altering brains.

Expand full comment

That's not the claim is it?

Expand full comment

Keller Scholl

"it is quite a tall order to ask academics to adapt their epistemologies" and "I wonder why there is not a single other scientific field that requires one to rethink their epistemology?" do kind of seem like claims that asking people to rethink their epistemology is inherently unreasonable.

Expand full comment

Sure if you can't read a full sentence. The claim is not that asking people to adapt their epistemology is unreasonable in general, but asking scientist to do so in the complete absence of a coherent theory or empirical evidence.

Expand full comment

Keller Scholl

The complete absence of empirical evidence that AI exists? That it can do things? I think I misunderstand you.

Expand full comment

Jacques

> The problem with this line of thinking is that it requires a bit of philosophical wrangling, one that (for reasons unclear) the vast majority of academics seem unwilling to engage in. This is particularly frustrating because if you’re going to make forceful claims about epistemology, it seems rather unsporting to dodge the resulting debate.

It is very, very, very, very, very annoying that AI believers continue to play dumb about this point. Pattern-matching words is different from thinking! And this is so incredibly obvious that I suspect that AI believers are either lying or are just philosophical zombies. Like, I'm honestly not sure if some of the people involved in this debate posses qualia.

Even a human who grew up alone and therefore was incapable of language would still be able to think. This is because we have words, which *signify* things, and reference objects, or *things-in-themselves* which are *being signified.* As I mentioned in another comment elsewhere, putting "ball" and "bat" together because they appear in sentences together often is *categorically different* than putting them together because you've been to a baseball game and so you know that people use bats to hit balls.

That these are *two different things* is so obvious that to say you don't understand the difference between them, you're either lying or a philosophical zombie.

RE: hallucinations, this is another area where AI believers just don't understand skeptics and are therefore not responding to the point. Skeptics aren't citing the rate of hallucinations, they're talking about the *cause* of hallucinations. Humans make errors because they have a false belief about the underlying reference objects; LLMs make errors because they pattern-match words with a psychopathic disregard for the truth. A well calibrated LLM can reliably tell the truth *despite not knowing what that is* of sufficiently trained by motivated researchers. But imagine claiming that a properly calibrated clock *literally knows what time it is* simply because it always shows the correct time. That's insane!

RE: investment; companies are pouring lots of money into AI because 1) they think AGI is possible because *financially interested researchers have told them so* and 2) LLMs will have a lot of economic applications even though it will, in my view, never become AGI. Even a glorified pattern-matching software can automate lots of tasks that currently only require very good pattern matching skills, like technical writing, clerical work, basic copywriting, basic coding, basic data analysis, etc etc. LLMs will still be impactful and disruptive - but there's no need for us to get hyped into believing outlandish claims that pattern matching robots are going to become AGI just because people who have a direct financial interest in investors believing this say so.

Expand full comment

Anko

A lot can be said but in short two important papers on the subjects you mention

LLMs can learn to reason without words https://arxiv.org/abs/2412.06769

LLMs naturally form human-like object representations https://www.nature.com/articles/s42256-025-01049-z

Expand full comment

LLMs are ungrounded structuralist models and only process semantics based on the internal relationship within text between tokens. We have now had over a century of debate over whether structuralism is an accurate description of human language. The argument remains unresolved although LLMs have rather dramatically demonstrated how far structuralist models can take you.

Expand full comment

Jurgen Appelo

I'd like to phrase it as being a skeptical advocate of AI. I'm a short-term pessimist and long-term optimist. There's a lot that still sucks but it will only get better.

Expand full comment

Nathan Lambert

should've included a gary marcus meme, what great self constraint you have shown, nice piece

Expand full comment

Blake from WTF Over

In my personal experience, instances of AI hallucinations (aka poor statistical inferencing) are much higher, especially when it comes to reasoning tasks and complexity and nuance. Plus, the second a model accesses web content everything gets worse.

Expand full comment

Interesting. My experience is that “hallucinations” are extremely frequent without the system consulting the web, but less frequent when web searches are allowed. Would be interested in hearing more about the problems with web search.

Expand full comment

Blake from WTF Over

Fair point. My comment conflated two issues: 1) I experience the hallucinations on a regular basis; and 2) I find LLMs provide poor quality or incomplete information when they access the internet for answers. These issues stem from separate technical processes. The second is not a hallucination.

Expand full comment

Thanks - that’s really helpful. My experience reflects yours. For some of my work in business I am finding the output good enough to be useful, albeit with careful supervision.

Expand full comment

Blake from WTF Over

What I think LLMs are most useful for is assisting with brain storming and providing constructive criticism.

Expand full comment

Craig Yirush

Get back to me when an LLM gets on a plane, flys to an archive, does original research, and then contributes to the sum total of human knowledge by writing it up. Right now all you’ve got as a Chicago Sun Times summer reading list with 4 books that don’t exist.

Expand full comment

Agreed although curators are working hard to digitise many of these collections. My daughter’s PhD is sourced from a combination of trips to physical archives in Dublin and consulting obscure corners of the internet.

Expand full comment

Frederick Lake

You essentially do what you critique in reverse; each of your suggestions apply as criticisms of your article in reverse.

Expand full comment

There is a certain irony given the role of deconstruction in the debate over structuralist models of language.

Expand full comment

Eddie Gunn

Jun 19

Calling it a “bullshit generator” feels clever, but it’s lazy. It’s the kind of move a freshman makes to sound deep without doing the work.

Expand full comment

Randall

Jun 22

How about "a system that generates plausible but potentially unfounded responses"? Don't worry, I'm not lazy I just asked perplexity.

Expand full comment

Jul 16Edited

“Bullshit” has a respectable definition in academic circles (Frankfurt 1986, 2005), although that definition includes a degree of agency on the part of the source that most academics using the term probably wouldn’t grant an LLM.

Expand full comment

Nelson Zagalo

Insightful piece, Harry. You articulate with clarity something I’ve also been observing with growing concern: the performative nature of much academic AI criticism, where terms like “bullshit generator” become ritual signals rather than analytical positions. Your point about this critique-as-posture is especially sharp — and needed.

I particularly appreciated your call for humility and empirical engagement.

I recently wrote The 3D of the AI Religion (https://mirrorsofthought.substack.com/p/the-3d-of-the-ai-religion), which looks at how dogmatism (of both the utopian and nihilistic kind) has taken over the conversation. But your list of practical suggestions for better criticism is what the field really needs: not just a map of what to reject, but a sketch of how to think better.

Thanks for advancing the conversation.

Expand full comment

Pamela Wang, PhD

It could also be from a very technical perspective that AI is really just doing pattern matching, so when people try to attribute blame on AI, or go on about AI overlords, as an actual AI researcher, I think they should go read the science behind this.

I was in a book club with someone who went to try AI, and then he just ranted on about how the AI was lying to him, because he asked it for some information and it gave an incorrect answer.

We were telling him that that’s because his AI likely did not have access to the Internet and the information that it did have was probably dated before that website.

He was adamant that it did have that information for he asked it something else and it answered correctly. And that I was being ridiculous because of course it has access to the Internet just like him. It is on the Internet.

These people don’t understand tooling, but they do believe that AI is incredibly intelligent, because it sounds intelligent. On a few topics. Is this tiny amount of testing they are satisfied to trust it until it fails them and then they… start to treat it like a malicious human.

I think the lack of understanding about the flaws of LLMs is what makes us researchers adopt a defensive stance against AGI madness.

Realistically AI is not going to take over the world, but we may have an ethics problem when people depend on AI and don’t realise that it comes from biased data. Because we are biased.

Sexism, in natural language processing is a good example. It is more worrying that people will use AI to try and evaluate say resumes, based off past hiring decisions without realising that they may be perpetuating the biases of before.

Expand full comment

Godshatter

Thanks for this. I’ve found talking to some academics about LLMs to be a bit crazy-making. It’s clear they haven’t even bothered to play with a good model for any sustained amount of time and are just vibes-matching “big tech bad; ai is big-tech; ai is bad”.

Expand full comment

Clayton Ramsey