I was at a conference a few weeks ago when a presenter gave the practice of evaluating large models a bit of a kicking. They argued that the central issue with evaluations is that they focus on assessing potential capabilities, which means that they don’t give an accurate representation of use at the ‘instance’ layer (i.e. how a model tends to respond across a series of discrete interactions with users over time). As a result, they reckoned that popular benchmarks like MMLU aren’t that useful for understanding how models are typically used in the real world.
This idea stands in contrast to one discussed by AI evaluations outfit Apollo Research last week. Advocating for a ‘science of evals’, they said that researchers typically struggle to identify the upper bounds of capabilities—essentially, what a model is potentially capable of—because you can eke out better performance by using exotic prompting regimes. The upshot is that evaluations aren’t actually very good at determining capability thresholds, which means that we don’t fully know what model safety profiles look like. For example, researchers found that the vanilla GPT-4 model beat the specially designed BloombergGPT at analysing financial text, while Microsoft used clever prompting to get GPT-4 to surpass models that were specifically finetuned on medical knowledge on the MultiMedQA benchmark suite.
These two ideas amount to scoping risks associated with certain models from above (i.e. determining the upper boundaries of capabilities) and from below (i.e. understanding what typical usage actually looks like once we take a model out of the lab).
But in both cases, something is missing.
Evaluations that take place at the capabilities layer and those that take place at the instance layer are two different ways of helping us scope the risk profile associated with a particular model. Both modes are useful in their own right, but neither tells us how dangerous it is to actually release a model relative to those risks already accepted by society. This is what we call marginal risk: the risk presented by a new technology relative to risks posed by existing technologies.
Take open-source. The core question for understanding the most appropriate way to respond to permissive access regimes (if at all) is how much more likely an open model is to cause harm compared to a closed one. Some people think that open-source models have a lower marginal risk than closed models, while others think the opposite is more likely. In each case, it depends where you think the offence-defence balance lies: do you think that an open-source model is more likely to favour those who want to cause harm or favour those who want to protect against it?
In practice, though, we just don’t know. As a study from Stanford recently explained: “Rigorous evidence of marginal risk remains quite limited. This does not mean that open foundation models pose no risk along these vectors but, instead, that more rigorous analysis will be required to ground policy interventions.”
Marginal risk isn’t the only factor to consider when designing policy. We also need to think about maximally bad outcomes, the likelihood of these outcomes happening, the distribution of harms and benefits, and the opportunity cost of not deploying a particular model.
In the aviation industry, for example, we know that planes have the potential to crash but we still let them fly. That is in part because organisations like the International Civil Aviation Organization (ICAO) have helped to develop and promote a programme of standardisation that has increased safety, but it’s also because we factor-in the economic benefits of air travel when making calculations about the risk profile of air travel.
Ultimately, though, I’m writing this post because too often people who work in AI governance (myself included) don’t do a good enough job of contextualising risk. Understanding worst case scenarios and real world usage are both laudable goals that should make-up dedicated programmes of research, but we should also strive to understand the extent to which AI represents a departure from the array of risks we all face every day as citizens of the modern world.
This is not to say that I want to wave away the many risks posed by AI, from dangerous capabilities connected to the most powerful models to the harms perpetrated by the deployment of systems built using unrepresentative data or used in inappropriate contexts. It may be that in lots of situations AI represents a risky departure from the use of existing technologies (and for extreme risks, we are clearly dealing with a degree of novelty).
But without conducting experiments designed to test for absolute and marginal risk, we simply don’t know.
Because the internet
Consider misinformation. Only a couple of weeks ago, the World Economic Forum said that in the next two years AI-powered misinformation and disinformation represented a greater risk than “economic downturn[s]”, “extreme weather events”, and, uh, “interstate armed conflict”.
In terms of how they got to this conclusion, the authors essentially amalgamated opinions from “1,490 experts across academia, business, government” in order to create a rough index of top risks over the next 24 months. In pages 18-21 they explain the reasoning behind the experts’ views, which essentially boils down to what the report describes as the emergence of “large-scale artificial intelligence (AI) models” that “have already enabled an explosion in falsified information and so-called ‘synthetic’ content”.
The problem, of course, is that AI hasn’t yet hopelessly degraded our epistemic security—despite the emergence of powerful and widely available tools that have the potential to do so. Harvard, for example, tackled the three most common arguments about AI’s impact on the information environment (increased quantity of misinformation, increased quality of misinformation, and increased personalisation of misinformation) and found that each was over-egged. As the authors explained, “existing research suggests at best modest effects of generative AI on the misinformation landscape.”
Misinformation doesn't really make sense as a framing because a) most people don’t consume much outright false information, b) popular definitions of misinformation have been broadened to encompass true but misleading content, and c) based on the latter definition, you get into the absurd situation whereby ‘misinformation researchers’ end up spreading misinformation themselves. There’s a good piece from Dan Williams if you are interested in understanding the problems in more detail, but I do think that the concept of misinformation is useful for understanding why marginal risk matters.
I like the Harvard report because researchers are comparing the emergence of AI-powered misinformation to existing risks (in this case, the generation and proliferation of misinformation enabled by the internet and associated communication technologies). This is what we mean by marginal risk: investigating what the risk profile of future technology X is compared to existing technology Y. There are essentially two parts to this process. First, as above, we can assess whether AI is enabling increased quantity, quality, and personalisation of misinformation compared to existing methods. Second, we’re looking for evidence that AI is driving changes along these axes in the real world—rather than whether models have the capability to exacerbate the problem in principle.
So, the next question is, why exactly does the ability of AI to degrade the information environment seem overpriced? Well, as I will argue here and for a couple of other examples below, the internet (especially social media) has inoculated us against some of the risks posed by AI. After years of discussion about the deteriorating quality of our information diet, we are all, consciously or not, primed to be sceptical of the things we see online.
But it goes further than that.
A few months ago, a friend managed to (almost) fall for one of those postal delivery scams that we all know and love. They didn’t lose any money—and, in their defence, were in the middle of moving house so were both stressed out and expecting a lot of parcels—but the experience underscored how difficult it is to distinguish between good and bad actors when scammers have got you in their sights.
They said it was sort of like being in the middle of David Fincher’s reality-warping The Game, so much so that they didn’t believe they successfully got their bank to secure an account given some of the grammatical choices the fraud department made in its communications. Interestingly enough, the UK’s National Cyber Security Centre recently said that AI will assist ‘social engineering’ efforts, including through helping scammers create emails free of spelling mistakes. That may be true on paper, but the issue is that, when even legitimate corporate communications are treated with suspicion, you have to wonder how much of a difference AI-powered typo-free messages will make to scammers. (Perhaps in a year or two fraudsters will deliberately add mistakes to appear more human.)
Now, anecdotes are not evidence, but I truly struggle to see how AI could make that particular instance of fraud—which involved a volley of phoney phone calls and emails—any more effective given the already very impressive abilities of the scammers. The emails were slick, the acting convincing, and the text messages well written.
And yet this isn't the first time officials have said fraud is on the rise. Representatives from Europol (the law enforcement agency of the European Union) told The Guardian that the agency had recorded a sharp uptick in fraud on dating and social media apps. According to the report, Europol said that LLMs are enabling “criminals to target multiple victims at once” to increase the number of people they can contact with each individual scam, usually by asking for money to escape a difficult situation.
Unfortunately, the agency didn’t provide any statistics, so we don’t know A) what sort of increase we’re talking about or B) the extent to which AI is responsible for said rise. We do know that, in theory, language models are already good persuaders (see Bai et al. in 2023, Jakesch et al., 2023, and Karinshak et al., 2023) but we still don’t have much information about whether AI is fueling fraud in the real world. For the dating app fraud discussed by Europol, it’s just not that clear how much better language models are at writing convincing text than bad actors, how much more convincing AI generated pictures are for catfishing, and how much better (if at all) audio generation models are compared to real con-artists.
Of course, it’s important to remember that not everyone enjoys exactly the same circumstances. Some people are vulnerable and we should do everything in our power to protect them from these sorts of dangers. But that vulnerability is precisely what I’m talking about: if, given a small marginal risk, someone is responsive to AI-powered fraud, then they will also be receptive to good old fashioned human deception.
Wrapping up
Fraud and misinformation aside, there are concerns about whether AI will inflate beauty standards, whether it will create an overload of content and pollute the internet, and whether it risks “destabilising the concept of truth itself”. In each of these cases, the internet has already demonstrated the risks and forced us to adjust our expectations. We know what we see on Instagram is not a reflection of reality. We look over spam emails with glazed eyes. And we take every headline with a pinch of salt.
I am not saying that the internet has inadvertently protected us against every risk posed by AI. I’m also not saying that, for those risks that it has primed us to resist, it offers complete protection. What I am saying is that we need to carefully study the marginal risk posed by powerful models, in this case with respect to the extent to which AI is aiding the production, circulation, and efficacy of mis-and-disinformation.
The reason epistemic security is a good case study is that it confronts us with the uncomfortable gap between what we think should be happening and what is actually happening. This distance is directly proportional to marginal risk. When the marginal risk of deploying a new technology is smaller than we anticipate, the perception gap between speculated impact and real impact inflates.
Evaluation protocols designed to test for marginal risk from the outset can help. RAND, for example, tested the ability of large models to create bioweapons, finding that “outputs generally mirror information readily available on the internet, suggesting that LLMs do not substantially increase the risks associated with biological weapon attack planning.” More approaches like these—together with efforts to evaluate the upper limits of capabilities and real world usage—is a good place to start.
Fantastic article and great writing. You've managed to describe and combine some difficult abstract concepts related to how we think about AI.
I've been thinking about this all morning but with my feet still on the ground, rather than swan diving into a sea of panic.