Weighed, measured, and found wanting

You're telling me an AI aligned these values?

Aug 05, 2025

Luminarium Encyclopedia: Medieval Cosmology and Worldview — Ptolemaic Planisphere by Andreas Cellarius, Harmonia Macrocosmica, 1661 (later reprint).

John Wilkins was a mover and shaker in the early years of the Royal Society. He was a clergyman and an experimenter whose passion project was ‘philosophical language’, a universal written system that could directly correspond with the structure of things in the world.

Wilkins wanted to turn the bones of language into an ontological framework for making sense of reality. His efforts remind me of noted postmodern linguist Plato, whose Cratylus put forward the idea that words must have intrinsic meanings. We are told that the Homeric hero Hector, for example, gets his name from the Greek verb ‘échein’ or ‘to hold’ because he was said to ‘hold’ the city of Troy as its great protector.

In his 1668 An Essay towards a Real Character, Wilkins introduced descriptive tables that showed how components of language could be used to classify certain animals. The word for 'elephant’ turns up as ‘zibi’, made up of ‘zi’ (the two letter root for every beast), followed by a ‘b’ (the consonant marking whole footed mammals), before finishing with i (the vowel assigned to the corresponding species in that row).

Like so many neat ideas, Wilkins’ philosophical language dissolved on contact with reality. It was too clever and too clumsy. Once you try to sort things into boxes, you soon find that the world has an annoying habit of contorting to avoid easy classification.

Argentine Jorge Luis Borges found Wilkins’ work in the 1940s, then famously sent it up by describing a ‘Celestial Emporium’ whose animal classes include ‘those belonging to the Emperor,’ ‘frenzied ones,’ and ‘those included in this classification.’ Borges’ point was that taxonomies are as arbitrary as they are brittle, that they tend to break the moment they are faced with a creature, a culture, or contradiction that doesn’t fit the scheme.

Wilkins thought his work could be a cabinet of cabinets, a scala naturae for the age of microscopes and coffee house empiricism. He lived as science wrestled with its Scholastic inheritance, a drive to fix the natures of things by figuring what they were and how they were related to one another. Our man pushed that logic to its obvious conclusion. If the world is orderly, then a language that mirrors that order must also be orderly.

One of these things is not like the others

On 29 March 1823 a package from Sir Thomas Brisbane, the Governor of New South Wales, arrived at Edinburgh College Museum. Inside were two platypus carcasses, their ‘rostrum half dissolved, and the pile loose,’ as the curator’s assistant William MacGillivray grumbled in his log.

One went to the display case, the other to the Scottish anatomist Robert Knox’s dissecting table, where its curious mix of qualities proved inconvenient for every classification schema of the day. Knox found fur but no nipples. A keratinous beak and a cloaca, but no feathers to match. And a venomous spur without a cold-blooded body temperature.

There were lots of ways to cut the taxonomical cake, but the knife of choice came when Carl Linnaeus laid down the rules around a century earlier. In the 1758 Systema Naturae he offered a key composed of classes, orders, genera, and species, each demarcated by a handful of traits. Hair and teats? Mammal. Feathers and a beak? Bird. Scales and cold blood? Reptile. The attraction was its promise of mutual exclusivity: once a creature could be placed within one class, every other stayed limits. For a while it worked a charm, letting European naturalists sort the spoils of empire into their preferred locations.

By the early nineteenth century, however, the anomalies were coming thick and fast. Marsupials that suckled young yet carried their offspring in a pouch; microscopic euglena that prowled for food like an animal but carried chloroplasts like a plant; and the duck-billed creature that arrived in Scotland to confound the biologists. Each exception forced addenda, sub-orders, and awkward footnotes until the Linnaean grid was overrun by a patchwork of special cases.

To some extent the Victorians were alive to these anxieties. In A System of Logic published two decades after the incident in Edinburgh, John Stuart Mill argued that some groupings track real causal similarities while others are categories of convenience. After On the Origin of Species in 1859, the taxonomical project shifted away from fixed essences and towards a genealogical map of shared descent. In the post Darwinian order, classification was explicitly about relations and overlaps instead of Platonic blueprints.

As Ferdinand de Saussure took much glee in pointing out, a sign has no natural bond to its referent. Tree is not tree-ness in sound-form; it’s the noise we agree on because it isn’t three, free, or shrub. Meaning is the friction produced by contrast among signs. Vocabulary triangulates between differences and the essence of the thing is only stable insofar as it exists as what Ludwig Wittgenstein called a ‘family resemblance’ between things with overlapping similarities.

Moral philosophy by checklist

Values are the structure we impose on the messiness of the moral universe. They are meant to “capture collective wisdom about what is important in human life, in various contexts and at various scales” and help us sort the better from the bitter.

For large language models as in technology more generally, we appeal to ‘values’ as a source of illumination to help us puzzle through the most difficult questions and choices. Alas, the concept of ‘values’ is better seen as a symptom of confusion. We retreat behind values when we can’t find the right words for talking precisely about the most basic aspects of the human condition.

As Langdon Winner put it in The Whale and the Reactor almost forty years ago:

In a seemingly endless array of books, articles, and scholarly meetings, the hollow discourse about "values" usurps much of the space formerly occupied by much richer, more expressive categories of moral and political language. The longer such talk continues, the more vacuous it becomes, the further removed from any solid ground.

We are minded to believe that there have always been ‘values’ just as surely as there has been a long history of spirited discussion about them. Except that isn’t really true. People have always had commitments, responsibilities, preferences, tastes, aspirations, convictions and cares. But only in the last century or so has anyone bundled these things together as ‘values’ as we might understand them today.

Used as a noun, the word ‘value’ is an old term that has throughout most of its history meant ‘the worth of something’. Commonly the worth of an object in material exchange or the status or worthiness of a person in the eyes of others. The word properly enters social and political thought in the writings of eighteenth and nineteenth century political economists, most consequently via Adam Smith, David Ricardo, and Karl Marx.

For them ‘value’ meant the worth of a thing in a commercial sense, which is why a theory of value first appears wearing the clothes of economics. Later in the nineteenth century Friedrich Nietzsche commandeered the term to signify the sum of principles, ideals, and desires that make up the basic motivational structure of a person or people.

Nietzsche wrote about the need for umwertung aller werte or the ‘revaluation of all values’, a kind of controlled demolition of Christian morality. He wanted to tear down the moral order, sift through the rubble for anything still moving, and then rebuild a more life-affirming house from the ground up.

Later, Ralph Barton Perry proposed a ‘general theory of value’ that tried to give a reasonable account of the full range of human interests. Value is in this setting any object of note, whether that interest is aesthetic, moral, economic, or religious. He grounded these concerns in the life of instinct or desire, then cast ethics as a social technology for reconciling the inevitable clashes among them.

Even towards the middle of the nineteenth century, talk of ‘value’ was generally taken to be about some attribute of a given object. One might use or keep safe a thing because it had a certain value. Economic or sentimental, value was still value. We still accept this meaning, as say the ‘value of’ intellectual property or spending time with one’s family.

Today, you are just as likely to hear ‘value’ used to describe wholly subjective phenomena. People, groups, cultures, and even whole countries (British Values™ or American Values™) apparently have values that influence how they show up in the world.

These kinds of values are basically general dispositions, a semi-conscious filter of taste or conduct that reside in us rather than in the world. We do not cherish charity because charity is good; charity is good because our internal value set fires a positive signal when we see some philanthropy that we approve of.

All such things are personal sentiments don’t you know, despite the fact they can also be stretched across the full width of the nation state. You have your values just as I have mine. One community exalts self-reliance, another solidarity, a third ritual purity.

Our world is a values shop (not to be confused with a value supermarket full of discount deals), where we fill up the trolley with the values commensurate with internally held sentiments. Prices are strictly personal — your courage may be on two for one, my justice a luxury import — so haggling is futile.

The problem with this state of affairs is that it prevents us from thinking critically about the moral world. In the ethics of technology, things are rarely named outright as good, prudent, or admirable and courses of actions are seldom defended as fair or necessary. The winning move is to mumble about ‘values,’ as though the label itself ought to carry the day.

Keep off the grass

Values are a moral taxonomy, a set of friendly labels that lets corporations, governments, or individuals signal virtue without wrestling with the particulars. A list of values feels tidy, mutually exclusive, and reassuringly universal. But we know better, don’t we? Courage can too easily become recklessness, loyalty can clash with justice, and patience can take the edge off excellence.

For AI, critics and boosters both retreat behind ‘value alignment’ programmes that assume moral life can be rendered as a checklist — fairness, privacy, autonomy, and so on — and that the machine’s task is simply to occupy as many boxes as possible. You don’t need to say much about which values are preferable, you just need to cram in as many as possible to your taxonomy of virtues. In fact, if you just make sure one of them is pluralism you can call it a day.

The most basic facets of the human condition are easily swallowed by the value alignment project. Don’t think too hard about it. Better to concede that moral life has no rough edges and that the work of judgement is secondary. Who cares to ask what courage demands or whom justice serves when you can list pleasant sounding labels and pat yourself on the back for a job well done.

Behind the lists are ideals of good and harm, duty and power, claim and consequence. Those words bite because they force us to take sides and give reasons. They make the trade-offs real by reminding us that value alignment cannot in fact be all things to all people. Better than that, it forces us to concede that moral philosophy is more than ticking boxes.

David Manheim

Aug 11

Very well put, but it doesn't really address the critical failure mode of optimizing for an inexact or simplified goal, which is the critical ‘value alignment' issue. Saying that "it forces us to concede that moral philosophy is more than ticking boxes" seems to ignore the way that *any* choice made by a strong optimizer with an articulated goal is inevitably misaligned, not just according to some views, but according to all of them. Despite its utility at doing more than ticking boxes, and our need to accept that fact, moral philosophy historically doesn't address the problem of unavoidable misalignment at all!

Expand full comment

1 reply by Harry Law

Robert Wright

Aug 5

A well-reasoned exploration and argument. The fact that AI is a human invention, enmeshed in language and human interpretation, is significant. People underestimate the amount of human agency that is embedded in these models and then reinterpreted by other humans in their application.

1 more comment...

Learning From Examples

Discussion about this post