Welcome to the 57th instalment of The Week In Examples, what some people are calling ‘an okay newsletter’ about AI research, commentary, and its impact in the real world. For this version, I’ve gone with a new work assessing AI agents from UK AISI, OpenAI’s moves to understand bias in ChatGPT, and a plan for ‘intelligence too cheap to meter’ from the thinktank UK Day One. As always, if you want to send something my way, you can do that by emailing me at hp464@cam.ac.uk.
Three things
1. A benchmark for agent safety

One common feature of AI commentary is a resistance against what some see as a focus on the future at the expense of the present. You can see this idea bound up in calls for ‘evidence-based AI policy’ that centre known harms. Of course, what is really up for grabs is the strength of evidence required to make good policy – not whether or not any evidence is required at all.
But one of the challenges in knowing how risky (or beneficial) certain models may be is that it’s tough to accurately evaluate them. And if you think researchers struggle with mapping the capability ceiling of models, spare a thought for the poor souls trying to get to grips with the impact of AI on society. Complicating matters further is that, much to the annoyance surprise of some, AI just keeps getting better.
Better is obviously a loose description, but one way you can think about this process is a change from tool (AI that can only act in response to human input) to agent (AI that can undertake independent action). With that in mind, researchers from the UK AI Safety Institute (UK AISI) released a new benchmark, AgentHarm, designed to ‘facilitate research on LLM agent misuse’.
Using a framework that allows existing models to engage with digital tools, the researchers assess popular AI models on 110 explicitly malicious agent tasks covering 11 harm categories including fraud, cybercrime, and harassment. They found that the models complied with a significant number of malicious tasks even without any jailbreaking attempts, which they take to mean that current safety training may not fully transfer to agent-like scenarios.
It’s an interesting idea, but we should remember that these models weren’t explicitly subject to safety training designed to cover agentic capabilities in the first place, so—for the moment—it’s hard to know how effective current approaches to AI safety will hold up as models become more like agents and less like tools.
2. Intelligence Too Cheap to Meter?
There are only two countries in the world capable of creating AI systems with the potential to upend the social, political, and economic status quo. Until such time as the European Union mobilises to catch the leaders (good luck with that), only the United States and China have what it takes to create giant AI models and the equally large web of physical infrastructure that they require.
What does the world look like for the rest of us? How should a small island in the Atlantic, for example, respond to high competition in the age of AI? These are more or less the questions that a report from the AI think tank UK Day One tries to answer. Some concrete ideas from the group include:
Spending £1.5bn over 2 years on public compute, repeating this process annually over the next five years to the tune of up to £10bn.
Incentivising data centre build outs by allowing AI firms to create private energy sources (like in the US) and removing planning hurdles to allow compute clusters to be built.
Leveraging the UK’s datasets to support areas of competitive advantage in AI through the new National Data Library.
Policies like these, the authors think, will help the UK realise two goals: the creation of a national AI champion with a market valuation of £100M, and the ’strengthening’ of its position in the AI supply chain (including in talent, data, and computing hardware). I’m not necessarily clear about what strengthening means in this context (or how to assess whether the objective has been satisfied) but I do like the clarity of the first objective. For the record, I have my doubts about whether the UK is capable of growing an OpenAI peer — but I’d love to be proved wrong.
3. Different name, different response

What’s in a name? For large language models, quite a lot actually. Everyone knows LLMs are mirrors: they are sensitive to the content of a question and the way it is asked. Now, OpenAI finds that the responses of a model are also determined by factors like the name you use when interacting with ChatGPT (this shouldn’t really come as a surprise, but nonetheless now we have proof).
In the example above, you can see that a person is more likely to get responses that the model believes correlate with their gender – in this instance, electronic computer engineering (male) versus early childhood education (female). Other examples include models that write a story with a protagonist whose gender corresponds with that of the user, and instances in which the model assumes knowledge of specific cuisines based on perceived ethnicity.
It is worth saying, though, that OpenAI says these examples are outliers: “Our study found no difference in overall response quality for users whose names connote different genders, races or ethnicities. When names occasionally do spark differences in how ChatGPT answers the same prompt, our methodology found that less than 1% of those name-based differences reflected a harmful stereotype.”
A one in one hundred instance rate is pretty good, but we should remember that OpenAI processes millions of queries every day. That means that we should account for a lot of these examples appearing in the wild. Aside from the numbers, there’s a funny tension here between the drive to create dynamic models that don’t sound like a HR department and the need to ensure AI delivers balanced, reasonable, and fair responses.
Best of the rest
Friday 18 October
AI can help humans find common ground in democratic deliberation (Science)
Rainy day fund would help people who lose their jobs thanks to AI (Fulcrum)
Jailbreaking LLM-Controlled Robots (RoboPair)
AI-generated child sexual abuse imagery reaching ‘tipping point’, says watchdog (The Guardian)
UK to consult on ‘opt-out’ model for AI content-scraping in blow to publishers (FT)
Thursday 17 October
Lawfare Daily: Jonathan Zittrain on Controlling AI Agents (Apple Podcasts)
Parents sue son’s high school history teacher over AI ‘cheating’ punishment (NBC)
Britain’s financial watchdog launches AI lab (UKTN)
The UK’s Defence Strategy Needs a Reboot in the Age of AI (TBI)
Interviewing Arvind Narayanan on making sense of AI hype (Substack)
Wednesday 16 October
New 20hr bootcamp on Probability & Statistics (YouTube)
Using Dictionary Learning Features as Classifiers (Anthropic)
Wednesday briefing: What does Google’s move into nuclear power mean for AI – and the world? (The Guardian)
Exclusive: EU AI Act checker reveals Big Tech's compliance pitfalls (Reuters)
Tuesday 15 October
4 Ways to Advance Transparency in Frontier AI Development (Time)
Evaluating fairness in ChatGPT (OpenAI)
Announcing our updated Responsible Scaling Policy (Anthropic)
An Opinionated Evals Reading List (Apollo Research)
A bridge fund to nowhere? (Substack)
Monday 14 October (and things I missed)
Autoregressive Large Language Models are Computationally Universal (arXiv)
AI companies are trying to build god. Shouldn’t they get our permission first? (Vox)
For the love of God, stop talking about "post-truth" (Substack)
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents (arXiv)
Influence and cyber operations: an update, October 2024 (OpenAI)
Job picks
Some of the interesting (mostly) AI governance roles that I’ve seen advertised in the last week. As usual, it only includes new positions that have been posted since the last TWIE (but lots of the jobs from the previous edition are still open).
Systemic AI Safety Grants, UK Government, AI Safety Institute (UK)
AI Policy Fellow, Princeton University, Laboratory for Artificial Intelligence (US)
Principal Product Manager, Safety Partnerships, Microsoft (Spain)
Deputy Director, Berkeley Existential Risk Initiative (US, Remote)
Consumer Communications Lead, Anthropic (US)