Benjamin Franklin famously said “in this world, nothing is certain except death and taxes.” It is in that spirit that I once again share with you The Week In Examples, written on a rainy Saturday morning in Cambridge, England.
This time around we have new research from RAND about the potential for AI models to help bad actors create bioweapons, new work from Anthropic and the Collective Intelligence Project, and the release of Stanford’s inaugural Foundation Model Transparency Index.
As always, make sure to tell me what works and what doesn’t or just drop me a line to say hello at hp464@cam.ac.uk.
Three things
1. Schrodinger's biorisk
What happened? A new report from RAND reviewed how likely state of the art models are to be able to assist bad actors in creating a bioweapon, and how powerful models might exacerbate biorisk in the future. The report found that while large language models cannot yet explicitly generate instructions to build a biological weapon, they can offer concerning guidance that could assist those seeking to plan and execute biological attacks. The researchers found that large models discussed topics like obtaining pathogens, distributing them to cause an outbreak, and acquiring botulinum toxin (a highly potent neurotoxic protein produced by the bacterium Clostridium botulinum) under the guise of legitimate research.
What’s interesting? The primary way in which LLMs increase the likelihood of an attack taking place, which is acknowledged by the report, is by filling the knowledge gaps needed to create a bioweapon. An important question here is to what extent these models will be able to provide information that is not already available via the internet. As they explain, “it remains an open question whether the capabilities of existing LLMs represent a new level of threat beyond the harmful information that is readily available online.”
What else? I am generally in two minds when it comes to biorisk. It seems self-evident that if we reduce barriers to knowledge we ought to expect more bioweapon attacks to take place. But the internet provides plenty of sources of information for bad actors to use, and the number of attacks has not dramatically increased in the last 20 years. So what’s going on? Well, when I speak to biologists they generally tell me that the combination of strong restrictions on raw materials, less severe restrictions related to equipment and lab space, and the technical know-how needed to create a bioweapon seem to have done enough to mitigate biorisk in the internet era. Perhaps that will hold true in the age of large models.
2. Claude’s constitution gets an upgrade
What happened? Anthropic, the group behind Claude, and the Collective Intelligence Project announced the results of an experiment involving ~1,000 Americans to collectively draft a constitution to align the values of a version of its Claude Instant chatbot. Here, ‘constitution’ refers to the normative principles that Anthropic embedded in Claude by using AI feedback to evaluate outputs of its model. Claude’s original constitution was made public in May 2023 and was based on sources including the Universal Declaration of Human Rights, Apple’s Terms of Service, and rules developed by Google DeepMind for its Sparrow system.
What’s interesting? Participants contributed 1,127 statements via the online deliberation platform Polis, casting 38,252 votes at an average of 34 votes per person. The result was a new set of principles that maintained a 50% overlap with the existing constitution. Anthropic said a few key differences stood out: “principles in the public constitution appear to largely be self-generated and not sourced from existing publications, they focus more on objectivity and impartiality, they place a greater emphasis on accessibility, and in general, tend to promote desired behavior rather than avoid undesired behavior.” You can see a comparison of the new and old constitutions here.
What else? They tested the new constitution by creating a version of the Claude Instant chatbot to test against the vanilla model. In doing so, the researchers found that the public and standard models performed similarly on language and maths tests, and were rated as equivalently helpful and harmless by users. However, the public model showed less bias across nine social dimensions according to the BBQ evaluation, even though both models reflected similar political ideologies based on the OpinionQA benchmark. My own view here is that this is important work. As models diffuse throughout the economy, the question of who exactly they are for – and who they are representative of – will become more relevant. Something to keep an eye on, though, will be whether Anthropic uses this technique (or a version of it) for the successor to the state of art Claude model.
3. Transparency index looks through popular models
What happened? Stanford University released the inaugural version of its Foundation Model Transparency Index, which scores ten major developers across 100 indicators to assess the transparency of their models. The indicators span upstream resources like data and compute, details about the models themselves, and downstream practices related to distribution and societal impact.
What’s interesting? The results show what the researchers characterise as “pervasive opacity” across the AI industry, especially regarding upstream data sources, data labour practices, and downstream societal impacts (though some developers do score highly on certain indicators). The index establishes a baseline to track progress on transparency over time and provides recommendations including that lawmakers should make a broad conception of transparency a top policy priority, developers should share more information about how their models impact the AI supply chain, and that deployers should bake-in the risks of opacity to their assessments.
What else? Transparency is generally deemed to be universally good in many walks of life, and science and technology is no different. For AI, the benefits of transparent approaches include boosting accountability, fostering innovation, building public trust, and facilitating effective governance and regulation. But transparency, at least as defined in the report, has a cost. The group’s effort also considers sharing details of development methods and mechanisms to provide access within its approach, which means firms like Meta and Stability come out on top because they allow others to download models in full. Unfortunately, as we saw recently, an open-source model can be fine-tuned to remove any safety guardrails placed on it by its developers. Goodbye “as a language model, I cannot…” and hello help with phishing attacks, pyramid schemes, and constructing homemade explosives.
Best of the rest
Friday 20 October
Is AI alignment on track? Is it progressing... too fast? (Alexey Guzey)
Eureka! NVIDIA Research Breakthrough Puts New Spin on Robot Learning (NVIDIA)
Independent report finds UK leads the way with AI Standards Hub (UK Gov)
‘Here is the news. You can’t stop us’: AI anchor Zae-In grants us an interview (The Guardian)
Clearview AI Successfully Appeals $9 Million Fine in the U.K. (The New York Times)
Thursday 19 October
AI Act: EU Parliament’s legal office gives damning opinion on high-risk classification ‘filters’ (Euractiv)
Living guidelines for generative AI — why scientists must oversee its use (Nature)
WHO outlines considerations for regulation of artificial intelligence for health (WHO)
Researchers Say Guardrails Built Around A.I. Systems Are Not So Sturdy (New York Times)
Mustafa Suleyman and Eric Schmidt: We need an AI equivalent of the IPCC (FT)
Wednesday 18 October
Sociotechnical Safety Evaluation of Generative AI Systems (arXiv)
Belt and road forum: China launches AI framework, urging equal rights and opportunities for all nations (SCMP)
Fuyu-8B Model Card (Hugging Face)
UK’s global AI summit must provide solutions rather than suggestions (New Scientist)
Anthropic’s AI chatbot Claude is posting lyrics to popular songs, lawsuit claims (CNBC)
Tuesday 17 October
Misinformation reloaded? Fears about the impact of generative AI on misinformation are overblown (Misinformation Review)
A 30% Chance of AI Catastrophe: Samotsvety's Forecasts on AI Risks and the Impact of a Strong AI Treaty (TAISC)
XSTEST: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models (arXiv)
‘AI Godfather’ Yoshua Bengio: We need a humanity defense organization (Bulletin of the Atomic Scientists)
Monday 16 October
New innovation challenge launched to tackle bias in AI systems (UK Gov)
How ChatGPT is transforming the postdoc experience (Nature)
Review of new Chinese ‘red-teaming’ AI legislation (X)
ChatGPT may be better than a GP at following depression guidelines - study (The Guardian)
India's AI vision calls for 80 exaflops of AI infrastructure build (The Register)
Thanks, Harry, for these important updates and commentary.
Claude is a special project, and I am excited to see how it develops.
The embedded article on LLMs and the post doc experience is really interesting and reassuring.
Me too! I am hoping things continue to work out for the project!!!