Societal hardening, control theory, forecasting AI policy [TWIE]
The Week In Examples #36 | 18 May 2024
This was the week of the AI girlfriend agent. Google previewed its Astra assistant and OpenAI released GPT-4o, a new version of its flagship product with impressive voice functionality. I’m not writing about GPT-4o this week (you can read good pieces here, here, and here) but it's worth reflecting on some of the more interesting choices made by OpenAI to connect the model to the AI companion from Spike Jonze’s 2013 film Her.
While it’s true that you can opt for a male voice when you use the model, this decision is a useful entry point to discussions about the extent to which people should be encouraged to anthropomorphise agents. The answer is probably not as straightforward as you think. In an ideal world, developers would delegate control all the way down to users (like Google Cloud does) but that doesn’t account for the knock-on effects on society at large. The solution is probably some form of ‘personalisation within bounds’ but the question that poses is an obvious one: what should those bounds be?
Three things
1. Living with advanced AI
What happened? Researchers at the Centre for the Governance of AI argued for measures to improve societal adaptation to ‘advanced AI’, which they define as “AI systems that approach and exceed human capabilities”. They suggest that many actors will have the ability to create powerful AI models, that safeguards can be removed, and that adaptation can promote beneficial uses of AI without inhibiting capabilities. The authors introduce a three part response to minimise harms from AI at the societal level: making harmful uses more difficult or costly (avoidance); reducing the chance of initial harm occurring (defence); and minimising negative impacts when they do occur (remedy).
What's interesting? This idea has a Copernican flavour: instead of focusing safety efforts solely on how systems may affect society, we ought to harden society to minimise negative effects of AI adoption. Consider election interference using AI. As well as the work undertaken by developers, they argue for societal adaptation including things like requiring identity verification; running public awareness campaigns and introducing content provenance techniques; and considering special investigations and rerunning elections.
What else? The authors are right in that safety approaches ought to ‘look beyond the model’ to figure out what sort of interventions we can make to ensure that AI adoption does not cause major instances of harm to take place. That being said, the core challenge with this model is finding the ceiling for these interventions. In the case of rerunning elections, for example, I suspect that the cure would probably be worse than the disease. It doesn’t take much to imagine what the reaction to repeating an election could look like once a victor has been declared and supporters begin their celebrations. Still, the basic premise is solid. The rub is in figuring just how far to cast the net.
2. Control theory for safe AI
What happened? Where the first paper centres the societal impact of AI, this week’s second paper foregrounds how interaction with a model changes its behaviour. Researchers from Princeton and Carnegie Mellon University argue “that meaningful safety assurances for these AI technologies can only be achieved by reasoning about how the feedback loop formed by the AI’s outputs and human behavior may drive the interaction towards different outcomes.”
What's interesting? To make their case, the group draws on the literature from the control systems theory, which deals with ensuring present actions don't lead to future failure. The paper extends control systems' safety filters to human-AI interactions, proposing a mechanism consisting of a fallback policy, a safety monitor, and an intervention scheme to ensure that action today does not lead to disaster tomorrow. The result, which also draws on existing AI assurance measures, is what the authors describe as a programme of ‘Human-AI Safety’ that seeks to “determine under what conditions the AI can maintain safety for all allowable realizations of the human’s future behavior.”
What else? Both of these papers show us that—for better or worse—the impact of AI is already being felt, regardless of whether better systems are just around the corner. Even if the scaling laws don’t hold (though I wouldn’t bet on that) improvements in user interfaces, clever scaffolding tricks, improved tool use and good old fashioned diffusion will see millions of people use AI every day. In fact, millions of people already use AI. The important questions here are: 1) how much can we squeeze out of existing systems, and 2) how much more will scaling buy in terms of capability—and ultimately—value? Remember, the scaling laws don’t have to go on indefinitely. Two more generations of successful scaling probably buys us systems that can act autonomously, usefully, and (with a bit of luck) reliably. Better defences, at the development, interaction, and societal levels, will be needed long before then.
3. A crystal ball for AI policy
What happened? In the final paper of the week, researchers from Northwestern University and the Institute for Information Law used GPT-4 to forecast the effect of AI policy on mitigating a handful of negative impacts potentially caused by AI adoption. The group used GPT-4 to create ‘stories’ about the impact of Article 50 of the EU AI Act, which mandates transparency in AI-generated content. Stories in hand, the group conducted a user study with 234 participants to assess these scenarios across four dimensions: severity, plausibility, magnitude, and specificity to vulnerable populations. They found that transparency legislation is perceived to mitigate harms in areas such as labour and well-being but is less effective in areas like social cohesion and security.
What's interesting? It sounds obvious, but policy people don’t know what the impact of policies will be until after they are introduced. Of course, they try to model what the likely effect will be—sometimes via quantitative studies though mostly through qualitative scenario planning—but the only way to know what outcomes will actually appear is to implement a proposal and assess its impact. And of course, policy has an alignment problem of its own. While interventions are usually made with the best of intentions, they often come with hidden costs. As Paul Graham explains in his piece on solvency limits on procurement: “the real costs are the ones you never hear about.”
What else? While AI generated stories about the impact of policies are no substitute for evidence, they do represent an attempt to tease out the downstream impact of policy before it takes effect. The challenge is to create reliable scenarios that aim to weigh the known and unknown (and the intended and unintended) in equal measure. But for that sort of work, you have to imagine that good old fashioned human experts would be a better bet than today’s most powerful models. That might change in the future, but we’re not there yet.
Best of the rest
Friday 17 May
Do Chatbots Dream of AI Poetry? Calvino, Madness and Machine Literature (Faber)
International scientific report on the safety of advanced AI: interim report (UK Gov)
Safety first? Reimagining the role of the UK AI Safety Institute in a wider UK governance framework (Ada Lovelace Institute)
Introducing the Frontier Safety Framework (Google DeepMind)
Business locked in expensive AI 'arms race' (BBC)
AI Fake Bylines on News Site Raise Questions of Credibility for Journalists (Bloomberg)
Beyonce and Adele publisher accuses firms of training AI on songs (BBC)
Thursday 16 May
How Far Are We From AGI (arXiv)
OpenAI and Reddit Partnership (OpenAI)
Launch of the Global Centre for AI Governance (GCG)
MMLU-Pro Benchmark (Hugging Face)
How Dominic Cummings’ favourite AI firm captured the British government (Politico)
Researchers build AI-driven sarcasm detector (The Guardian)
AI voiceover company stole voices of actors, New York lawsuit claims (Reuters)
How to Hit Pause on AI Before It’s Too Late (TIME)
AI Safety Newsletter #35: Lobbying on AI Regulation (Substack)
Wednesday 15 May
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI (Substack)
Dynamics of Corporate Governance Beyond Ownership in AI (CommonWealth)
The State of AI Safety in China Spring 2024 Report (Concordia)
OpenAI Chief Scientist Ilya Sutskever Is Leaving the Company (Bloomberg)
Schumer’s slow-walk on AI ‘regulation’ is a nothing but a boon for Big Tech (The Hill)
Doomers have lost the AI fight (Axios)
Facilitating Opinion Diversity through Hybrid NLP Approaches (arXiv)
How AI could make workers more productive – but paid less (The Hill)
Tuesday 14 May
Desk-AId: Humanitarian Aid Desk Assessment with Geospatial AI for Predicting Landmine Areas (arXiv)
Google’s Gemini updates: How Project Astra is powering some of I/O’s big reveals (TechCrunch)
Laughing, chatting, singing, GPT-4o is AI close to human, but watch out: it’s really not human (The Guardian)
A.I.’s ‘Her’ Era Has Arrived (NYT)
Google’s invisible AI watermark will help identify generative text and video (The Verge)
Monday 13 May
US and China to hold first talks to reduce risk of AI ‘miscalculation’ (FT)
“Ground-breaking moment for science, innovation and technology” as UK’s most powerful supercomputer is officially online and debuts in global league (University of Bristol)
The Platonic Representation Hypothesis (arXiv)
Big Tech Companies Were Investors in Smaller AI Labs. Now They’re Rivals (TIME)
I Am Once Again Asking Our Tech Overlords to Watch the Whole Movie (WIRED)
Artificial intelligence hitting labour forces like a "tsunami" - IMF Chief (Reuters)
Job picks
Some of the interesting (mostly) non-technical AI roles that I’ve seen advertised in the last week. As before, it only includes new positions that have been posted since the last TWIE (but lots of the jobs from the previous edition are still open).
Policy Analyst, AI and Emerging Technology, Steampunk (Washington D.C.)
Senior Visiting Research Fellow, Legal Priorities Project (Global, remote)
Insider Risk Investigator, OpenAI (San Francisco)
IT Specialist, Policy and Planning, Artificial Intelligence Program, NIST (US)
Audit and Compliance, Anthropic (Remote, US)
Senior AI and Information Security Analyst, RAND (US)
The 'personalisation within bounds' paper was a super useful and interesting read! Do you plan to write about where you think the bounds should be?