The Week in Examples #7 [7 October]
The Safety Summit cometh, challenges for evaluations, and the participatory turn
Another week, another rundown of the most important news in AI safety, governance, and policy. Time may well be a flat circle, which is why I am once again reminding you that we have a bit of analysis up front and links at the back.
As always, make sure to tell me what works and what doesn’t or just drop me a line to say hello at hp464@cam.ac.uk. It occurs to me that I rarely say that for the essays that I write, but rest assured I want to hear what you have to say about those too!
Three things
1. Apollo makes recommendations for the UK AI Safety Summit
What happened? Apollo, the UK-based safety group, released its recommendations for next month’s AI Safety Summit. The organisation wants the event to enable alignment on government-led risk classification for AI systems, an understanding of the responsibilities for all actors within the AI value chain—including for developers of narrow and general purpose AI—and pursuing cooperation on compliance and enforcement activities between governments.
What’s interesting? The group also made one recommendation for the Frontier AI Taskforce to undertake ahead of the summit and two for it to consider once the event is underway. The first recommendation, to take place ahead of the summit, is to develop educational materials for policymakers focused on “embedding safety across the life-cycle” of a AI systems. These materials would emphasise pre-training assessments and the benefits and costs of staged releases respectively (and remind me of POSTNotes from the Parliamentary Office of Science and Technology). As for during the summit, Apollo suggests a combination of demonstrations of a range of AI safety risks (and how evaluations can identify and help address them) as well as sessions unpicking the role that a watchdog could play in using compute to identify AI systems that require more robust oversight.
What else? I like how practical these recommendations are given that the summit is just a few weeks away. The proposal for using the Frontier Model Task Force as as resource for education is an interesting one given that role has usually been played by a combination of arms-lengths groups like the Alan Turing Institute, in-house specialists like the Parliamentary Office of Science and Technology, or various other bits of the civil service machine. Over the long run, I suspect that the Frontier Model Task Force may perform some of that work—but right now I am unsure whether it’s a function it is likely to prioritise. On an unrelated note, I promise this is the last time I write about the summit until it takes place!
2. Challenges in evaluating AI systems
What happened? Anthropic discussed challenges in evaluating AI systems, including the implementation of benchmarks, the subjectivity of human-led evaluations, and issues with relying too heavily on model-generated approaches. In response, it calls on the US government to fund the science of repeatable and useful evaluations, the implementation of existing evaluations, and programmes to analyse the robustness of existing evaluation.
What’s interesting? The blog comes as Arvind Narayanan and Sayash Kapoor from the AI Snake Oil blog released annotated notes for their talk about why evaluating LLMs is a ‘minefield'. The authors highlight three key challenges in evaluating large language models: prompt sensitivity, construct validity, and data contamination. First, results can be an artefact of prompting techniques rather than intrinsic model properties, as shown by potential issues replicating claimed political bias. Second, evaluating abstract attributes like bias often lacks construct validity, failing to model real-world behaviour and relying on abilities like test-taking that don't reflect human competencies. Finally, contamination between training and testing data has long affected evaluation, undermining reproducibility.
What else? The AI Snake Oil talk is great piece of work, but I do have a bit of an issue with the ‘construct validity’ argument that describes political bias as primarily a joint phenomenon. All technologies are political artefacts, and the values, beliefs etc. encoded within them are decoded at the point at which technology comes into contact with the world. Yes, we could wall-off a biassed system to prevent the hardening of particular world models—but that wouldn’t make it any less biassed in substance. The problem here is that the constructivist view risks shunting responsibility down the value chain away from the developer and towards the user.
3. Participatory design under scrutiny
What happened? Researchers considered the limits of participatory approaches in a new paper assessing what they describe as the “participatory turn” in AI design (defined as reactions to calls to involve members of communities impacted by AI systems in their design). They survey a collection of different participatory approaches, which, though too many to mention, include things like user-centred design, service design, co-design, participatory action research, and value-sensitive design. The authors also discuss participatory democracy and civic participation, which refers to processes that involve citizens and stakeholders in civic decision-making.
What’s interesting? Across each of these areas (and the many others I don’t mention), the researchers sketch what they term “dimensions of participation” based on four main types: consultation, inclusion, collaboration, and ownership. The idea is that third party ownership over the design is the most extensive type of participation, while consultation is deemed to be the least extensive. They applied this framework to 80 papers describing participatory processes in AI to find that the vast majority of research either consults or enables collaboration with users. Only a tiny handful of research projects allow users to shape the system’s scope and purpose, determine whether it should be built, or allow stakeholders to play a central role across the system’s lifecycle.
What else? AI labs are accelerating efforts to incorporate public input via initiatives spanning alignment assemblies (OpenAI and Anthropic), community fora (Meta AI), and efforts to boost democratic deliberation (Google DeepMind). Relatedly, Anthropic released research focusing on scalable deliberation with pol.is as well as a project aiming to understand which values are encoded in large models. Amidst these moves, however, researchers from the Ada Lovelace Institute have—in similar manner to the work described above—argued that the majority of participatory AI efforts do not engender partnerships that empower those involved.
Best of the rest
Friday 6 October
Autonomous AI systems in the face of liability, regulations and costs (Nature)
UK data watchdog issues Snapchat enforcement notice over AI chatbot (The Guardian)
Treason case: What are the dangers of AI chatbots? (BBC)
Broken 'guardrails' for AI systems lead to push for new safety measures (FT)
ChatGPT-owner OpenAI is exploring making its own AI chips (Reuters)
Thursday 5 October
AI: Voice cloning tech emerges in Sudan civil war (BBC)
Evaluating the historical value misspecification argument (LessWrong)
Getty Images CEO Craig Peters has a plan to defend photography from AI (The Verge)
Meta and X questioned by lawmakers over lack of rules against AI-generated political deepfakes (AP News)
Bill Gates-Backed Startup Launches AI Chatbot for Personalized Movie, Book Picks (The Wall Street Journal)
Wednesday 4 October
AI’s Present Matters More Than Its Imagined Future (The Atlantic)
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning (Anthropic)
Generative AI Is Coming for Sales Execs’ Jobs—and They’re Celebrating (Wired)
AI threatens to dethrone the 4-year college degree (Axios)
Dell's revenue forecast signals AI boost will take longer to materialize (Reuters)
Tuesday 3 October
Guidance on using generative AI at the BBC (BBC)
Language Models Represent Space and Time (arXiv)
The governance of AI systems (Blair Attard-Frost)
How to Promote Responsible Open Foundation Models (Stanford)
Representation Engineering: A Top-Down Approach to AI Transparency (arXiv)
Monday 2 October
Global AI governance: barriers and pathways forward (OII)
Meta says its AI trains on your Instagram posts (Axios)
Why Big Tech's bet on AI assistants is so risky (MIT Tech Review)
No formal investigation into AI chips, EU antitrust regulators say – Reuters
JPMorgan’s Dimon Predicts 3.5-Day Work Week for Next Generation Thanks to AI – Bloomberg
One for the road
