Agents, music, and LLMs in science [TWIE]
The Week In Examples #31 | 6 April 2024
We begin with a thank you to everyone who gave me feedback on last week’s roundup. The results are in: more arXiv papers are here to stay. One small caveat is that arXiv has a bit of a lag in uploading papers, so Fridays are probably going to be slightly more media-centric than other days. Either way, the comments, emails, and messages were very helpful. If you want to tell me what you think—or get in touch for any other reason—it’s hp464@cam.ac.uk as always.
As for The Week In Examples, in this edition we look at a new report on AI governance from the University of Toronto, a study centering the impact of music on various national indicators, and a review of large language models’ use in scientific practice from Stanford. Vamos!
Three things
1. Living with AI agents
What happened? In 2003, the Matrix Reloaded told us that the agents are coming, but sadly left out meditations on the legal, social, or economic ramifications of agentic AI systems. But over 20 years later, researchers are starting to go where the filmmakers dared not, and have begun to sketch out some of the problems that we need to solve before we can safely live with AI agents. In that spirit, University of Toronto researcher Noam Kolt applied good old-fashioned economic and legal theory to AI agents (defined as ‘autopilots’ that can independently take actions to accomplish complex goals on behalf of users) in a new paper earlier this week. The research identifies a handful of problems that are likely to arise in a world with thousands or even millions of agents working on behalf of humans. These include things like discretionary authority (making sure the agent doesn’t vicariously use authority to act unreasonably), loyalty (determining how best to keep an agent acting in the user’s best interests), delegation (how to manage the creation of subagents), and information asymmetry (managing situations in which the agent knows more than the person, or ‘principal’, employing it).
What's interesting? I am going to skip through tonnes of good stuff, but essentially Kolt says the traditional solutions for these problems—like creating carrot and stick incentives, effectively monitoring behaviour, and using penalties to constrain behaviour—may be difficult to implement. The upshot is the suggestion that we should think about building incentives to align the agent with needs and values of both the user and society, work to create technical solutions to enable visibility, and introduce a new liability regime for agents. To do that, Kolt reckons there are three questions we ought to address: which actors should be held liable for harm caused by AI agents, 2) what are the circumstances in which liability should arise, and 3) what is the appropriate standard of care? These are big questions, and the paper doesn’t profess to answer them, but I do want to emphasise that, right now, it’s not all that clear who should be responsible should an agent cause harm to others.
What else? Agents do not have legal personhood, so they cannot be held criminally liable in the way that you or I can. So if not the agent, then who? There are a few options: the person who initially instructed the agent to act, the organisation that deployed the agent, and the developer who built it. All of these pose issues ranging from determining whether the agent was really manifesting intent (the user), whether the agent can be viewed as an employee (the deployer), and whether the party who made the agent could be deemed to be close enough to an action taken by one of its agents (the developer). Right now, there’s just not enough information to say how it will likely play out – but I suspect this is one of those things that will be answered shortly after the agents start rolling off the metaphorical assembly line.
2. Scientists use of LLMs on the rise
What happened? Stanford University released an assessment of LLM usage in scientific research. In new work, researchers analysed 950,965 papers published between January 2020 and February 2024 on arXiv, bioRxiv, and the Nature portfolio of journals. The group found that the use of large language models is on the rise across the board, with the largest and fastest growth observed in computer science papers (up to 17.5%). By way of some comparison, the authors reckoned that mathematics papers and the Nature portfolio showed the least LLM usage (up to 6.3%).
What's interesting? The extent to which analysis techniques are capable of detecting AI generated writing remains in doubt, with popular detector tools generally thought to be unreliable. In this case, however, the researchers suggest that their approach—taking a corpus level view that avoids the need to classify individual documents or sentences—is likely to be more robust. If we accept the conclusion that usage is on the rise, which certainly feels plausible given information volunteered by scientists (see below) then the study shows that certain domains are more likely to use LLMs than others. The prevalence of usage in computer science is perhaps unsurprising given these are the people who are most likely to be familiar with the technology (and its limitations) but it is curious that the bioRxiv corpus shows an instance rate of about 7.5%. Maybe that’s because biologists aren’t as familiar with LLMs compared to computer scientists, but it does strike me that language models could find a lot of use there. That the Nature portfolio set, made of up 15 journals, shows relatively low usage may be an indicator that the spectre of peer review seems to discourage use. Possibly a rare win for the peer review system!
What else? In a 2023 survey of more than 1,600 scientists, Nature found that almost 30% said that they had used generative AI tools to help write manuscripts, and about 15% said they had used them to help with grant applications. On the benefits of AI, over half (55%) of researchers cited translation, a finding that was replicated in a separate poll by the European Research Council (ERC). As for downsides, around 70% thought that it could lead to “more reliance on pattern recognition without understanding” while a further 59% said the technology may entrench bias. Of course, these should be issues that could be overcome in principle with good research practices and a strong review process — but in practice researchers are, after all, humans too. People generally like to take the path of least resistance, and scientists are no different.
3. Researchers march to their own beat
What happened? Researchers from BRAC University in Bangladesh and Hamad Bin Khalifa University in Qatar (plus a few others) used a series of statistical tools to investigate the relationship between national anthems and different global indices according to the World Peace Index, the World Suicide Rate Index, the World Crime Index, the World Happiness Index, and the World Human Development Index (each taken from World Population Review). They collected national anthems from 169 countries to determine that “certain factors, such as low pitch, high tempo, high beat, low note duration, and high rest duration, may be associated with lower suicide rates and higher scores for happiness and peace.”
What's interesting? Based on these results, the group reckoned that the findings have “implications” for policymakers (without saying what those implications actually are). For the study, though, the rub is that there are probably thousands of confounding factors (outside influences that can affect the results of a study) moderating the relationship between the nature of a country’s national anthem and the socioeconomic indicators that they draw into focus. And, to be fair, this is something that the researchers recognise: “this research only investigates the chances of correlation, not causation…though we have found a very significant correlation in some aspects, we cannot conclude that national anthems are the cause behind the countries’ global indices.”
What else? That being said, they also reckon that national anthems are the “most sung and heard song of every country, so must have a deep imprint in the minds of its people.” Maybe I’m not patriotic enough, but it's not all that clear whether national anthems have a substantial psychological effect on the population of a given country. I rarely think about “God Save The King” even though it can be a lot of fun when people get singing in a pub before watching the England football team lose in the quarter-finals of the World Cup. But that’s sort of the point: the anthem is a stand-in for togetherness, not something whose substance is likely to realise any sense of belonging in its own right (possibly I’ll concede that it did during the Napoleonic wars). In any case, today I suspect that singing more or less anything catchy would get the same result.
Best of the rest
Friday 5 April
China will use AI to disrupt elections in the US, South Korea and India, Microsoft warns (The Guardian)
Poll: Are you worried that your job could be taken by AI? (The Engineer)
Inside Big Tech's underground race to buy AI training data (Reuters)
Data acquisition strategies for AI-first start-ups (Air Street Press)
EU-U.S. Terminology and Taxonomy for Artificial Intelligence - Second Edition (EU)
Thursday 4 April
AI and the Problem of Knowledge Collapse (arXiv)
Beyond One-Size-Fits-All Fairness (AI Substack >> Policy Perspectives)
Billie Eilish, Pearl Jam, Nicki Minaj Among 200 Artists Calling for Responsible AI Music Practices (Billboard)
InsectMamba: Insect Pest Classification with State Space Model (arXiv)
OpenAI’s GPT Store Is Triggering Copyright Complaints (WIRED)
Introducing improvements to the fine-tuning API and expanding our custom models program (OpenAI)
Alzheimer's disease detection in PSG signals (arXiv)
Wednesday 3 April
Responsible Reporting for Frontier AI Development (arXiv)
The Artificial Intelligence Ontology: LLM-assisted construction of AI concept hierarchies (arXiv)
Meta's AI image generator can't imagine an Asian man with a white woman (The Verge)
Former Snap AI chief launches Higgsfield to take on OpenAI’s Sora video generator (TechCrunch)
In a first, FDA authorizes AI-driven test to predict sepsis in hospitals (The Washington Post)
US, EU to Use AI to Seek Alternate Chemicals for Making Chips (Bloomberg)
You can now edit DALL·E images in ChatGPT across web, iOS, and Android (OpenAI >> X)
Amazon Ditches 'Just Walk Out' Checkouts at Its Grocery Stores (Gizmodo >> commentary)
Tuesday 2 April
AI Act and Large Language Models (LLMs): When critical issues and privacy impact require human and ethical oversight (arXiv)
AI WALKUP: A Computer-Vision Approach to Quantifying MDS-UPDRS in Parkinson's Disease (arXiv)
The Cartography of Generative AI (AI Cartography)
Deep Learning Foundations by Soheil Feizi : Large Language Models (YouTube)
Many-shot jailbreaking (Anthropic)
But what is a GPT? Visual intro to Transformers | Chapter 5, Deep Learning (YouTube)
The ‘Meta AI mafia’ brain drain continues with at least 3 more high-level departures (Fortune)
Are large language models superhuman chemists? (arXiv)
Monday 1 April (and things I missed from last week)
Mapping the Increasing Use of LLMs in Scientific Papers (arXiv)
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models (arXiv)
Jamba: A Hybrid Transformer-Mamba Language Model (Hugging Face)
Implications of the AI Act for Non-Discrimination Law and Algorithmic Fairness (arXiv)
Artificial consciousness. Some logical and conceptual preliminaries (arXiv)
Uncovering Bias in Large Vision-Language Models with Counterfactuals (arXiv)
Inside the UK’s AI company incorporation boom (UKTN)
U.S., U.K. Announce Partnership to Safety Test AI Models (TIME)
Can Language Models Recognize Convincing Arguments? (arXiv)
A Trusted AI Compute Cluster for AI Verification and Evaluation (Lennart Heim)
Yi-34B, Llama 2, and common practices in LLM training: a fact check of the New York Times (EluetherAI)
A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures (arXiv)
Generative AI for Architectural Design: A Literature Review (arXiv)
Generation and Detection of Sign Language Deepfakes -- A Linguistic and Visual Analysis (arXiv)
Job picks
These are some of the interesting (mostly) non-technical AI roles that I’ve seen advertised in the last week. As always, it only includes new roles that have been posted since the last TWIE – though many of the jobs from two additions ago are still open.
Expression of Interest, Epoch, Remote (Global)
OpenAI Cybersecurity Grant Program, OpenAI, Remote (Global - repost)
Policy Internship, Center for Human-Compatible Artificial Intelligence (US)
Communications and Engagement Lead, Frontier Model Forum (US or UK)
European AI Policy Lead and Program Manager, OpenAI (EU)