Machine understanding, democracy, and video generation [TWIE]
The Week In Examples #35 | 11 May 2024
I’m in Toulouse this weekend, so if this edition is a little rougher than usual you can blame that on the Pink City (what they call Toulouse, apparently, because of the terracotta bricks used for its buildings). In any case, this time around we have the usual jobs, links, and another summary of three papers: an essay on machine understanding, work assessing language models in democratic deliberation, and research setting out some of the factors that determine how likely people are to use video generation models.
As a rule, I’m trying to write less on the big announcements—I suspect you can all get commentary on those elsewhere—and more on research and analysis that you might not have seen. That being said, I have to acknowledge AlphaFold 3 from Google DeepMind and OpenAI’s new work explaining how they determine model behaviour. Both are worth checking out. As usual, it’s hp464@cam.ac.uk for feedback, comments or anything else. Keep the messages coming!
Three things
1. Understanding machine understanding
What happened? Herbert Roitblat, an American author, wrote a short essay about machine understanding. The basic idea is that, as some commentators like to suggest, large language models do not ‘know’ what a certain word represents, only that it ought to come before some and after others. They argue that the model merely aggregates statistical relationships between patterns, and not much else. Roitblat’s paper summarises arguments for and against this idea. In response, it calls for a new wave of work to foreground the so-called symbol grounding problem (how to connect representations to the things they actually represent) in language model research. To do that, it suggests researchers ought to look at things like whether models are capable of understanding object permanence, whether they can represent causal relations, and whether they are capable of the “representing of meaning.” He doesn’t state his preferred methodological approach outright, but essentially, he—like many others—is calling for better evaluations. The difference is that these evaluations would focus specifically on understanding, not things like capabilities performance, extreme risks like biorisk or cybersecurity, or the structural impact of models on society.
What's interesting? The analysis centres two well known stories about the inner workings of the mind. First, there is behaviourism, which argues that human behaviour is acquired through a process of reinforcement and punishment. This school of thought was championed by B.F. Skinner, who wrote extensively on the model beginning in the 1930s (he also wrote the novel Walden Two, which is a fun if bizarre read). Second, there is cognativism, which proposes that we do things because we think about them—not simply in response to external stimuli. The essay also acknowledges that this debate goes way further back than the 20th century, which is something I wrote about in my history of nativism and empiricism last year.
What else? The paper puts forward an interesting idea inspired by the philosopher John Locke: “Language tokens are signs for ideas. Put more generally, this approach argues that the speaker has some idea in mind, selects tokens conditional on that idea. The listener receives those tokens and selects an idea conditional on those words and on the listener’s expectations.” In this sense, language models are at the very least doing half of what Locke had in mind by virtue of their ability to communicate concepts to others (regardless of whether they have an understanding of these concepts themselves). For language models, the debate comes down to whether these symbols can be learned simply through building bigger systems, or whether a fundamentally new approach is needed. My own view is somewhere in the middle. It seems unlikely that deep understanding can be achieved through scaling approaches alone, but today’s models do seem to be capable of some limited forms of understanding. That shouldn’t be possible if scale was a complete dead end (regardless of whether the models are just simulating this process).
2. AI for democracy gets vote of confidence
What happened? Toulouse-based researchers published work exploring whether it is possible to use language models to create “personalized digital-twins to act as intermediators or assistants augmenting the participatory ability of each voter.” To do that, they asked volunteers to select among 67 policies extracted from the government programs of Brazil’s two main presidential candidates (Luis Inácio “Lula” da Silva and Jair Bolsonaro) to finetune four popular LLMs: Llama-2 7B, ChatGPT 3.5 Turbo, Mistral 7B, and Falcon 7B. They found that models were, at least in a very limited way, capable of standing in for real people to help scale deliberation beyond the bounds of what would be feasible within the normal political process.
What's interesting? The group assessed how well the AI models could guess the preferences of individuals it hadn't seen before. The results were, though, a bit so-so: they were able to correctly guess the preferences between two policy options 69-76% of the time. After the initial results, the researchers used the models to fill in the gaps in their preference data by taking a small subset and asking another model to generate new information about preferences. They applied this data to the full group of participants to analyse aggregate preferences in the round, which they did by comparing how likely one policy proposal was to be received compared to another in a direct comparison across the whole group (i.e. not just for individuals). Using this method, the researchers were able to boost the figures for the aggregate preferences of the sample from 30% to 75%.
What else? AI for democracy is the flavour of the week. We mostly hear about how AI is spreading misinformation to destabilise the political process, though there’s not much evidence this is actually happening. Whatever the case, this sort of work is about showing how AI may actually be beneficial to the strength of the polity. The basic idea behind the use of AI to enrich the political process is that it—as this research aims to show—may be possible to create proxies for people’s views to hash out differences. Bob has a language model that knows his preferences. So does Alice. Bob’s model talks to Alice’s model, and finds areas of agreement and divergence. Scale that up to millions of people, and you get the idea.
3. Will anyone use Sora?
What happened? In the final paper of the week, researchers from Shenzhen University looked at the drivers of user willingness to use the text-to-video models such as OpenAI’s Sora. They used the Unified Theory of Acceptance and Use of Technology (UTAUT) model, which aims to explain individuals' intentions to adopt technology by assessing things like the specific benefits of using a technology, the degree of ease associated with its use, and the extent to which consumers perceive that important others (e.g., family and friends) believe they should use it. Using this model (and adding in a few new factors) they found that realistic looking outputs and the extent to which the technology appears to be novel are the most likely factors in determining usage.
What's interesting? This is one of the first studies to empirically investigate user adoption of text-to-video AI models. It highlights the importance of the visual realism of AI-generated videos in driving user acceptance, arguing that—perhaps unsurprisingly—the extent to which video generation models represent an improvement on existing technologies are key factors in encouraging use. Assuming these results hold in the real world, the work seems to indicate that we ought to expect a lot of people willing to use realistic video generation technologies like Sora. The problem, though, is that this assumes that there are lots of people who will have a good use for the technology. It is one thing to assume that someone who wants to make a video is comfortable using Sora, and another to say that a person wants to make a video in the first place. If I had to guess, I suspect in the short term we’ll be looking at similar rates of adoption of image recognition, which a 2023 UK study found lags language model usage by half (though to be clear this is still a decent rate of adoption).
What else? I have been pretty impressed with user figures for generative AI applications. A commercially viable (read: useful) version of these systems has only really been on the cards since the release of ChatGPT last year. Since then, Pew Research reckons that almost a quarter of Americans have used ChatGPT, with almost half of young people using the platform. During that time, OpenAI has released the much improved GPT-4 model and updated its image generation function with DALLE-3, while Google and Anthropic have released models whose capabilities rival (and in some cases exceed) those produced by the San Francisco based developer. More significant is that adoption has proceeded while newer versions of generative AI models are rolled out and previous instances sunsetted. In the next year, the usefulness of the technology will increase as the release of better models introduces new functionalities that enable the systems to act as virtual agents. If they prove commercially viable, then expect adoption to accelerate.
Best of the rest
Friday 10 May
Will AI dream up the hit TV shows of the future? (BBC)
We don’t need an AI manifesto — we need a constitution (FT)
AI boom set to fuel data centre deals in Asia this year (Reuters)
Artificial Intelligence 'Friends' (NYT)
AI in Earth observation: a force for good (European Space Agency)
Thursday 9 May
The Potential and Implications of Generative AI on HCI Education (arXiv)
OpenAI Is Readying a Search Product to Rival Google, Perplexity (Bloomberg)
Concerns on Bias in Large Language Models when Creating Synthetic Personae (arXiv)
Vote of confidence in UK economy as British AI company Wayve secures over $1 billion to develop AI for self-driving vehicles (UK Gov)
Microsoft is 'turning everyone into a prompt engineer' with new Copilot AI features (The Verge)
Microsoft Creates Top Secret Generative AI Service for US Spies (Bloomberg)
Wednesday 8 May
Understanding the source of what we see and hear online (OpenAI)
Opportunities for machine learning in scientific discovery (arXiv)
China-France Joint Statement on AI and Global Governance (China Gov)
Explainable machine learning for predicting shellfish toxicity in the Adriatic Sea using long-term monitoring data of HABs (arXiv)
Our approach to data and AI (OpenAI)
Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes (arXiv)
Tuesday 7 May
Toward In-Context Teaching: Adapting Examples to Students' Misconceptions (arXiv)
Weekly Top Picks #72 (Substack)
Your guide to AI: May 2024 (Substack)
Scale AI Expands Global Footprint with New United Kingdom Headquarters (UK Gov)
Apple Is Developing AI Chips for Data Centers, Seeking Edge in Arms Race (WSJ)
Microsoft goes from bad boy to top cop in the age of AI (Politico)
A New Diplomatic Strategy Emerges as Artificial Intelligence Grows (NYT)
Monday 6 May
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents (arXiv)
AI for the Physical World (a16z)
API Partnership with Stack Overflow (OpenAI)
Do teachers spot AI? Evaluating the detectability of AI-generated texts among student essays (ScienceDirect)
OECD revises AI principles (OECD)
The misinformation wars - a reading list (Substack)
Job picks
Some of the interesting (mostly) non-technical AI roles that I’ve seen advertised in the last week. As always, it only includes new roles that have been posted since the last TWIE (but lots of the jobs from the previous edition are still open).
Research Scientist, Responsible Scaling Policy Evaluations, Autonomy, Anthropic (USA)
Research Manager, GovAI (Oxford, UK)
Research Scientist, Societal Impacts, Anthropic (US)
Mentee (Summer 2024), Supervised Program for Alignment Research (Remote)
Research Scientist, Responsible Scaling Policy Evaluations, Autonomy, Anthropic (UK)