The Week in Examples #16 [9 December]
Simulating history, deep unlearning for safety, and global compute
Good morning folks. After another big week in AI (we’re looking at you, Gemini) The Week in Examples is back with three bits of research that got me thinking. This time around, we have large models as historical simulators, what deep unlearning means for AI safety, and a snapshot of global compute from the Tony Blair Institute for Global Change. As always, message me at hp464@cam.ac.uk for comments, ideas for next time, or to say hello.
Three things
1. The very model of a modern major general
What happened? A new paper from University of Michigan and Rutgers University researchers, which I missed last time around, uses AI to simulate historical conflicts. The group presents WarAgent, a system that uses large language models to simulate the First World War, Second World War, and the Warring States period in China between 476 BC and 221 BC. According to the researchers, the goal of the effort is to answer the (quite lofty) question: “Can we avoid wars at the crossroads of history?” More specifically, though, the group wanted to know: 1) how effectively can LLM-based multi-agent system simulations replicate the historical evolution of decision-making , 2) are certain triggers for war more critical than others, and 3) are historical inevitabilities truly unavoidable?
What’s interesting? The paper creates individual agents for each country, complete with goals, favourability towards potential allies, and information about the various resources each country has at its disposal. Using these profiles, the researchers ran seven simulations, focused on the formation of military alliances, declarations of war, and non-intervention treaties. For 100% of the simulation results about the First World War, they observed that alliances formed between Britain and France, between the German Empire and Austria-Hungary, and between Serbia and Russia; while war declarations also occurred between Austria-Hungary and Serbia, Austria-Hungary and Russia, and the German Empire and Russia in every simulation. While every simulation ended in war, 9 in 10 mobilisations were accurately predicted, four fifths of alliances were deemed to be historically representative, and almost half of war declarations were said to have unfolded accurately.
What else? Historical simulation is by no means new, with Harold Guetzkow and colleagues creating experiments in the 1960s designed to simulate international conflict. In the 2000s, Ted Dickson conducted a simulation of World War 1 with students (not as dramatic as it sounds), while the decade also saw Eric Tollefson create the OneSAF Objective System and Raymond Hill’s simulation of the Bay of Biscay submarine battle during the Second World War. The historian in me, perhaps unsurprisingly, has a few questions about the utility of these sorts of approaches. Because it is not possible to completely replicate the conditions, qualities, and internal states of the parties involved in major historical events, it’s not clear to me that all the compute in the world will make the question “Can we avoid wars at the crossroads of history?” a soluble one. Nonetheless, as far as the field of historical simulation is concerned, this is a nice contribution.
2. Forgetful models remember to stay safe
What happened? MIT PhD student Stephen Casper wrote a post making the case for deep unlearning from a safety perspective. The tightly-written (and short!) article covers a lot of ground to argue that “deep forgetting and unlearning may be important, tractable, and neglected for AI safety.” I’ve seen some interesting stuff on deep unlearning from a privacy perspective, but much less on what unlearning means for broader questions related to the development of safe models.
What’s interesting? The post begins with the (at this point, canonical) idea that large models are good at things we try to make them bad at. Known as the Waluigi Effect after everyone’s favourite anti-plumber, this is the idea that “After you train an LLM to satisfy a desirable property P, then it's easier to elicit the chatbot into satisfying the exact opposite of property P.” It’s sort of like telling a child not to “get up on the kitchen counter using the chair to open the cookie jar” and being surprised when they do exactly that once you’ve left the room. The logical approach for preventing this Bizarro, mirror-universe dynamic is simple: make sure that a model knows only what it needs to know for the intended application and nothing more.
What else? We currently make big, general—foundational if you will—models that we can fine-tune for specific tasks (though, as we saw last week, clever prompting can actually beat fine-tuned models in specialised domains). The problem with dynamic from a safety point of view is that finetuning isn’t particularly good at making fundamental mechanistic changes to large pretrained models. As Casper puts it: “Finetuning only supervises/reinforces a model’s outward behavior, not its inner knowledge, so it won’t have a strong tendency to make models actively forget harmful inner capabilities.” To do this, the post suggests moves like curating training data, drawing on distillation methods to make models smaller, and direct modifications to a model using interpretability tools. Taking a step back, my own view here is that unlearning is going to become increasingly important given the very clear privacy implications (we should remember that large models are not excluded from the Right to be Forgotten). Given the obvious incentives for improving unlearning methods, I expect to see more to come on this one in the future.
3. Stakes for global compute out of this world
What happened? The Tony Blair Institute for Global Change (TBI) released a new report analysing the state of compute access around the world. TBI commissioned research group Omdia to carry out an audit of 55 countries using 25 key indicators that can be used to assess the state of a country’s compute ecosystem including installed server base, local supercomputer capacity, investment, talent, and tax policy. There’s a tonne of great stats in here for getting to grips with the global compute landscape, and I’d encourage anyone interested in this sort of thing to spend some time using the interactive sections of the report.
What’s interesting? A small group of countries are surging ahead in developing compute capacity, with the US alone hosting 29.3 million servers compared to 47% of the 55 countries analysed housing fewer than 20 data centres. Underscoring the unequal distribution of compute resources, the report also found that 47% of countries assessed are without access deals in place for machine learning cloud infrastructure, 50% of countries have made no investments in quantum computing, and 35% of countries have agreements in place for academic institutions to access quantum computing infrastructure. My favourite stat, though, was that “in 2023, Meta bought 30 times more H100s (the leading AI chip) than the British government procured for the AI Research Resource, a new national facility.” Yikes.
What else? TBI recommends that what it terms ‘emerging’ compute countries should conduct independent reviews of compute access, share resources through a consortium modelled on the EU’s joint supercomputing initiative, enhance security and resilience, and focus on measures to improve access to skilled workers. Those that the authors identify as ‘advanced’ compute nations, however, ought to build public compute infrastructure, enforce responsible compute governance, and collaborate on quantum computing. For both groups, compute is a question of sovereignty. In the context of AI, it enables technologies that require ongoing access to compute and allows foundational research to take place. States will, to varying degrees, want to be able to ensure the capacity to do both. How much capacity is needed—or, the extent to which countries are comfortable relying on foreign capacity—remain open questions.
Best of the rest
Friday 8 December
CMA seeks views on Microsoft’s partnership with OpenAI (UK CMA)
AMD now sees a $400 billion market for AI chips. Why that’s good news for Nvidia (CNBC)
The Race to Dominate A.I. (New York Times)
Letter: Precautionary principle might just save us from AI (FT)
Apps That Use AI to Undress Women in Photos Soaring in Use (Bloomberg)
Thursday 7 December
Digital Life Project: Autonomous 3D Characters with Social Intelligence (arXiv)
Evaluating and Mitigating Discrimination in Language Model Decisions (Anthropic)
UK watchdog warns companies over AI use and privacy (Reuters)
G7 Leaders' Statement: 6 December 2023 (UK Gov)
How to Run Good DARPA Programs (Statecraft)
Black Holes and the Intelligence Explosion (Sequoia)
Wednesday 6 December
Welcome to the Gemini era (Google DeepMind)
AI Ethics Brief #135: Responsible open foundation models, change management for responsible AI, augmented datasheets, and more (Substack)
Updating the Legal Profession for the Age of AI (Yale Journal of Regulation)
How Nations Are Losing a Global Race to Tackle A.I.’s Harms (The New York Times)
Europe was set to lead the world on AI regulation. But can leaders reach a deal? (AP News)
Disputed landmark AI rules face crunch EU talks (Reuters)
Elon Musk’s AI Dreams Are Going in Circles (Bloomberg)
Launch of the AI Alliance (AI Alliance)
Tuesday 5 December
Make no mistake—AI is owned by Big Tech (MITTR)
AI’s Influence on Music Is Raising Some Difficult Questions (TIME)
Legal experts step up to defend wave of AI lawsuits (FT)
Gutting AI Safeguards Won’t Help Europe Compete (Barrons)
A New Trick Uses AI to Jailbreak AI Models—Including GPT-4 (WIRED)
Casey, Colleagues Urge Federal Trade Commission to Track Artificial Intelligence Scams (Office of Senator Bob Casey)
Lawmakers reluctant to stop EU companies profiting from surveillance and abuse through the AI Act (Amnesty International)
AI Davids ride coattails of industry Goliaths (Reuters)
The future of AI is too important to be decided behind closed doors. There is a better way - (Fortune)
AI Needs Limits Imposed by Real People (Government Technology)
What should be done about the growing influence of industry in AI research? (Brookings Institute)
Monday 4 December
OpenAI COO Brad Lightcap talks about ChatGPT launch, Dev Day and how Sam Altman thinks (CNBC)
Kiss debuts ‘immortal’ digital avatars and plans to go ‘fully virtual’ (The Verge)
“Or they could just not use it?”: The Paradox of AI Disclosure for Audience Trust in News (SocArXiv)
We don’t have the guardrails’ for companies to rush into deploying AI, experts warn (Reuters)
For true AI governance, we need to avoid a single point of failure (FT)
Making Artificial Intelligence Work for Workers (US Gov)
Analyzing the Historical Rate of Catastrophes (Bounded Regret)