Public compute, pluralistic alignment, LLMs in biology [TWIE]
The Week In Examples #33 | 27 April 2024
I’m back from mountain climbing in the Scottish Highlands, which means less time spent outside drinking in the majesty of nature – but more time inside writing about AI. Truly, you win some and you lose some.
This time around, we have a new report asking some fundamental questions about efforts to build national compute capacity, research getting under the skin of the values imbued in language models, and a study looking at the use of LLMs in plant biology. As always, it’s hp464@cam.ac.uk for feedback, comments, pledges of fealty etc.
Three things
1. Public compute: What is it good for?
What happened? UK civil society group Ada Lovelace Institute, a UK civil society group, released a report reviewing the role of the state in providing compute for AI development and deployment. The work, which follows the UK government’s sizable investment in its UK AI Research Resource (AIRR) for researchers, makes the case that investment in the public provision of compute is a welcome but insufficient step for unlocking a “more plural, public interest model of AI development.” The post is a bit of a call to arms against the concentration of large scale AI projects in the hands of the private sector, which is directly connected to the eye-watering costs associated with frontier model development.
What's interesting? According to the authors, a central problem with publicly-minded compute initiatives is that it’s not particularly clear what type of ‘public benefit’ they are trying to realise (and indeed, what the downstream benefit from AI development looks like more broadly). As they explain: “Governments invest in sectors like renewable energy because they produce things we want, while satisfying other policy aims such as job creation. The equivalent case for industrial investments in AI remains unclear.” Aside from the rather existential question about what this is all for, the group also encourages the government to attach certain conditions to the use of the AIRR to promote socially beneficial use-cases (e.g. a commitment to red-teaming, data sharing for other users), consider onshoring the compute supply chain over the long term while providing cloud credits to use Big Tech’s infrastructure today, and investigate industrial strategy measures to steer the AI market like footing the bill for public-service recommendation algorithms.
What else? I don’t agree with everything here, but I like this report because it asks the question that not many seem to be bothering with: why, exactly, do governments need to build sovereign compute capacity in the first place? Now, there are lots of good answers to that question, but fundamentally they involve viewing compute as a finite resource that can be used to create useful things that we need – not just fuel for upstream experimentation in AI development (though that is of course an important element). In the future, I expect compute to occupy a position as a deeply valuable resource that powers large chunks of the economy, enriches our personal and cultural lives, and even strengthens the polity. Those outcomes, though, are contingent on carving out complimentary roles for the public and private sectors (not to mention some pretty major technical advances and good old fashioned governance measures). At the risk of kicking the can down the road, my view is that more thought about what the role of the state will be in the compute economy is sorely needed.
2. Align in the sand
What happened? Researchers from a whole bunch of organisations—including the University of Oxford and New York University on the academic side and Meta and Cohere from the corporate world—released a new study looking at how preferences for language models differ across the world. To do that, the group complied a database, PRISM, which includes 8,011 live conversations with 21 LLMs for 1,500 participants from 75 countries. This database represents the end result of a large scale experiment in which participants provided details of their background, familiarity with LLMs and stated preferences for fine-grained behaviours (i.e. specific information about how they want an LLM to behave).
What's interesting? The research doesn’t really get into the weeds with respect to how specific groups differ on core questions about values and beliefs. What it does do, however, is give an interesting overview of how likely certain constituencies are to discuss a given issue in the first instance. It found, for example, that older people (55+) are more likely to talk about elections and seek travel recommendations compared to younger people (18-24 years). Younger people, meanwhile, are more likely to discuss managing relationships or job searches. This might not sound like a hugely surprisingly finding (that is, people speak to language models about the things that are important to them) but it’s a good piece of empirical work that draws into focus the relationship between demographics and modes of usage.
What else? Today, most popular language models are ‘aligned’ with human preferences via techniques such as Reinforcement Learning from Human Feedback (RLHF). In RLHF, human raters are provided with a response that they ‘rate’ according to predefined criteria, which allows developers to build a reward model that changes the types of responses a model is likely to make over time. The rub, though, is that this process tends to work by aligning a model with ‘revealed preferences’ (i.e. what a person likes or does in practice) rather than ‘stated preferences’ (i.e. what a person says their preferences are). Revealed preferences might sound great in practice, but their use gets us back to the unfortunate trade-off between first and second order preferences I talked about last time around in which people think one thing about themselves (e.g. I want to eat healthily) but decide another in the moment (e.g. I am going to eat the pizza rather than the salad). On the flipside, though, stated preferences are not perfect either. You might tell your colleagues you appreciate honesty to save face when in reality you struggle with direct feedback. The upshot is that both approaches, revealed preferences and stated preferences have problems (for a much more comprehensive rundown of this tension I’d recommend an excellent paper on AI and value alignment from 2020).
3. LLMs put down roots in plant biology
What happened? It is AI’s blessing and its curse that it has the potential to be put to use in more or less any domain you can imagine. It is a technology whose usage could make just about anything better or just about anything worse. That dynamic, though, means I get to learn things that I would otherwise have no idea about. In this week’s final example that thing is: plant stress. At its most basic level, plant stress is the process whereby plants suffer from environmental factors that affect their growth, development or productivity. These stress factors can be physical, chemical or biological, such as extreme temperatures, water scarcity, soil contamination or the prevalence of pests or diseases. The repercussions of these stressors extend beyond individual plants, influencing entire ecosystems and with them the global environment writ large.
What's interesting? For that reason, researchers from Nanyang Technological University in Singapore and Saudi Arabia’s KAUST look at the ways in which AI “allows scientists to rapidly screen through massive and complex datasets to uncover elusive patterns in the data, enabling us to create more robust and faster models for prediction and hypothesis generation in a bid to develop more stress-resilient plants.” In simple terms, the group used LLMs to assess 2,000 research papers to determine which methods were most commonly used to study which stressor. This type of large model-powered meta-analysis follows similar efforts in agriculture, finance, biomedical research, and materials chemistry.
What else? Proponents of AI’s use in science like to describe AI as a tool for making sense of lots of complex, messy data. In this framing, AI is used to extract a signal from the noise to allow researchers to make sense of the natural world. I think this is probably true, though I am also sensitive to the idea that––taken to the extreme––this dynamic takes us to a place that may actually limit understanding. The idea here is that, much like an overreliance on GPS may degrade a person’s sense of direction (speaking for myself here), overuse of AI in the sciences may eventually lead to a process of deskilling in which AI understands the world, but we don’t. While I am not sure I completely buy the idea that, to paraphrase Henry Kissinger, this is “how the Enlightenment ends,” I do believe that there is no substitute for doing the work when you want to understand something.
Best of the rest
Friday 26 April
Beijing city to subsidise domestic AI chips, targets self-reliance by 2027 (Reuters)
Microsoft, Google post double-digit profit rises, boosting case for AI (Al Jazeera)
Disgruntled school worker accused of using AI to create fake recording of principal on racist rant (Sky News)
It’s not only AI that hallucinates (FT)
Rishi Sunak promised to make AI safe. Big Tech’s not playing ball. (POLITICO)
Thursday 25 April
ConsistentID : Portrait Generation with Multimodal Fine-Grained Identity Preserving (arXiv)
Self-driving cars are underhyped (Substack)
Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents (arXiv)
The tech wars are about to enter a fiery new phase (The Economist)
Drawing the Line: Deep Segmentation for Extracting Art from Ancient Etruscan Mirrors (arXiv)
Meta AI spending plans cause share price slump (BBC)
Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare (arXiv)
Wednesday 24 April
Humanoid Robots Wiki (Wiki)
Classifying Human-Generated and AI-Generated Election Claims in Social Media (arXiv)
Augment Inc. Raises $227 Million at $977 Million Valuation to Empower Software Teams With AI (Augment)
Deepfakes and Higher Education: A Research Agenda and Scoping Review of Synthetic Media (arXiv)
Anduril Selected for U.S. Air Force Collaborative Combat Aircraft Program (Anduril)
Moderna and OpenAI Collaborate To Advance mRNA Medicine (Moderna)
CMA seeks views on AI partnerships and other arrangements (CMA)
Tuesday 23 April
Simple probes can catch sleeper agents (Anthropic)
Large Language Models Spot Phishing Emails with Surprising Accuracy: A Comparative Analysis of Performance (arXiv)
The benefits, risks and bounds of personalizing the alignment of large language models to individuals (Nature)
Top tech companies seek more cash for the lab keeping AI safe (Washington Post > Letter)
Microsoft launches Phi-3, its smallest AI model yet (The Verge > ArXiv)
A National Security Insider Does the Math on the Dangers of AI (WIRED)
Monday 22 April
Holistic Safety and Responsibility Evaluations of Advanced AI Models (ArXiv)
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions (ArXiv > OpenAI)
Actually After Hours #3 with Dwarkesh (Podcast, X)
A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI (Google DeepMind)
Design of highly functional genome editors by modeling the universe of CRISPR-Cas sequences (biorXiv > X)
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preference (arXiv)
Job picks
As always, these are some of the interesting (mostly) non-technical AI roles that I’ve seen advertised in the last week. Just like last time, it only includes new roles that have been posted since the last TWIE.
Operations Associate, Epoch (Remote)
Researcher, EU Public Policy, Ada Lovelace Institute (Brussels)
Associate Director, Artificial Intelligence Policy, Federation of American Scientists (US)
AGI Safety Manager, Google DeepMind (UK)
AI Reporter, Fortune (US)
Research Associates, Frontier Model Forum (US and UK)
AI Impact and Evaluation Lead, AISI (UK)
Consultant, Ethics of AI, UN (Remote)
Senior Policy Advisor, International Partnerships, AISI (UK)
Program Analyst, Artificial Intelligence, US Government (US)