Self-sustaining systems

AI Histories #4: Automata theory

May 22, 2025

The May pledge campaign is still in full swing. We're on track, but if you haven’t yet, I’d love your help to get there. Pledging $5 lets me know you’re behind the project as I work toward turning on paid subscriptions (and hopefully making this my full-time work). Huge thanks to everyone who’s pledged already. It really does mean a lot.

The Gothic Arch, from "Carceri d'invenzione" (Imaginary Prisons), Giovanni Battista Piranesi (Italian, Mogliano Veneto 1720–1778 Rome), Etching, engraving, sulphur tint or open bite, burnishing; first state of six Robison) — The Gothic Arch, from "Carceri d'invenzione" (Imaginary Prisons) by Giovanni Battista Piranesi ca. 1749–50

Automata theory is difficult to explain.

For many, it gets lumped in with the self-duplicating machine. My mind wanders to the von Neumann probe: a hypothetical spacecraft that could fly to a nearby star, gobble up materials to produce copies of itself, and continue on to the next system until the entire galaxy had been checked out.

It’s a fun image, but one only partially connected to the topic at hand. At its core automata theory is about abstract structures that evolve through finite internal states according to fixed rules. They are conceptual models, mathematical frameworks that capture precisely how systems transform, sustain, or collapse over time.

It’s an idea that shape-shifts, one that likes to be all things to all people. Or it is in AI, anyway.

That’s because automata theory has shaped the two foundational schools of the field. Both the symbolic approach in which rules are hard-coded into a system and the connectionist branch where systems learn from examples (cue klaxon) have been influenced by its ideas.

For the former, that means designing systems that follow clear, rule-based transitions; for the latter, it involves reimagining those transitions as fluid patterns rather than fixed instructions.

For the purposes of this post, I’m defining an automaton as ‘a self-contained system that responds to inputs by changing state, whose behaviour is determined by its design.’ In practice, that means anything from a light switch to a language model can be seen as an automaton (so long as its next move depends on its current state and some external input).

Letters and numbers

Automata theory begins as an attempt to pin down the limits of reason. In the 1930s, a handful of logicians—Alonzo Church at Princeton, Alan Turing in Cambridge, and Emil Post in New York—found themselves asking what it means to compute something.

At stake was whether all of mathematics could, in principle, be reduced to symbolic procedures carried out by rule-following agents. To answer the question, these thinkers built abstract machines.

Church used λ-calculus, Post proposed rewriting systems, and Turing devised a model so evocative it would take his name. Each was a kind of automaton in that it described a self-contained system that processes inputs and moves through internal states according to fixed rules.

It’s from this moment that automata theory begins to take form as a toolkit for describing procedural reasoning with mathematics. What started as a way to solve problems in logic would eventually lay the conceptual groundwork for the artificial intelligence project.

As symbolic automata were being marshalled to model thought, von Neumann wondered whether the principles of life itself could be described in the language of mathematics. Working with Stanislaw Ulam at Los Alamos, he devised the idea of a self-replicating automaton. It was a system that, given a set of instructions, could construct a copy of itself within a defined grid-like universe.

By the 1950s, the question was no longer ‘can something be computed?’ Theorists proved that was possible, so they instead turned to structure. They wanted to know how different types of machines process different kinds of inputs.

Stephen Kleene, working with the mighty RAND Corporation, established the idea of regular expressions (patterns recognised by simple machines that move deterministically between states). Then Michael Rabin and Dana Scott showed how automata could be adapted to include non-determinism by exploring branching paths with multiple possible futures.

And from linguistics, a young Noam Chomsky imported a powerful organising framework. He saw a hierarchy of formal languages, each defined by the type of automaton that could recognise it. Chomsky showed that you can rank languages based on how complicated a machine you’d need to use them:

At the bottom, you have regular languages, recognised by simple machines (finite automata).
Above that are context-free languages, which need a machine with a memory stack.
Then come context-sensitive languages, needing a still more powerful device known as a linear-bounded automaton.
At the top, you have recursively enumerable languages, which require a full Turing machine (the most powerful kind of abstract computer).

In Chomsky’s hands, automata became models of cognition. For if language could be parsed by machines, perhaps the mind could be formalised as a generative system wired to produce and recognise structure.

Transition and constraint

By the second half of the 20th century, automata theory was something like a working metaphor for the mind. In symbolic AI and cognitive science, automata provided a language for representing internal mental states, transitions, and procedural rules.

But in parallel, a different tradition was emerging. Inspired less by formal logic than biology, researchers began to design networks that learned patterns over time. These connectionist models were a different kind of automata. They were systems defined by internal states that evolved through transitions, only here the transitions were learned rather than programmed.

Whether hand-coded or trained, both symbolic and connectionist systems relied on the idea that intelligence unfolds through structured change. Rules or learning, thinking became a process of moving from one configuration of the system to the next.

To study automata is to ask how systems maintain identity through transformation. Whether parsing a sentence or navigating a decision tree, they provide a way of thinking about structure in motion.

At their most abstract, they embody a vision of intelligence as patterned change within bounds. They help us model how rules unfold and how memory shapes behaviour.

Every time we train a model, we rely on assumptions about transition and constraint. Even neural networks, for all their complexity, are still systems that evolve through internal states.

I like automata theory because it suggests that intelligence, artificial or otherwise, is both stable and dynamic. It’s a timely reminder that underneath probabilistic outputs lies the idea that thought is a process, and that process can be mapped.

Learning From Examples

Discussion about this post