
The Quest for Artificial Intelligence by Nils Nilsson is one of AI’s most well-known histories. Nilsson’s account is by no means flawless, but it is a remarkably readable book that captures the great majority of the field’s most important milestones.
For a book that focuses almost exclusively on the recent lineage of the technology, it picks a curious place to begin. Not the Dartmouth Summer Research Project on Artificial Intelligence. Not Alan Turing or Alonzo Church. And not the mathematicians and scientists of the early modern period.
As you might have guessed given the title of this post, The Quest for Artificial Intelligence starts with Aristotle. More specifically, Nilsson reckons AI’s story begins with the syllogism.
In simple terms, the syllogism is a classic form of logical argument that deduces a conclusion from two premises. Our man Aristotle formalised this pattern over two millennia ago, which usually looks something like this:
All members of group X have property Y.
All members of group Z are members of group X.
Therefore, all members of group Z have property Y.
The most famous example is All humans are mortal. Socrates is a human. Therefore, Socrates is mortal. By dropping ‘humans’ and ‘mortal’ and ‘Socrates’ into the template, we get a logically valid conclusion.
In the syllogism, the form of the argument guarantees the conclusion regardless of the specific content. We could replace ‘humans’ with ‘athletes’ and ‘mortal’ with ‘healthy’ and it still makes a kind of sense. Reasoning can be abstracted into these formal structures, which helps us sift how we reason from what we’re reasoning about.
Our abstraction means that if we can represent facts symbolically (like ‘All X are Y’), we can let the form of a syllogism carry us to new facts without needing new observations from the world.
For centuries, syllogisms and formal logic were held up as the model of good thinking in Europe (until the Baconian programme of hands-on observation became the preferred ideal of rationality). Later thinkers created devices that could handle symbols — like William Stanley Jevons’ mechanical ‘logic piano’ that used Boolean algebra to solve logical problems — which suggested that machines could in principle automatically carry out logical inferences.
Once they could shuffle symbols, theorists wondered whether they might turn all of mathematics, and perhaps thought itself, into formal syntax. From Frege’s Begriffsschrift to Hilbert’s proposed solution to the crisis of mathematics, logicians treated reasoning as a game of symbolic moves played on blank paper. Mechanical devices proved those moves could be executed without human hands.
This idea stretched into the middle of the 20th century when ‘AI’ began to emerge as a distinct brand of research (though as we know its origins go much further back).
A new generation of researchers wondered whether if a computer could apply logical rules to symbols representing the world, then we might say that a machine was ‘reasoning’. In 1956, Allen Newell, Herbert Simon, and J.C. Shaw built the Logic Theorist program that proved mathematical theorems by searching proofs in propositional logic (an algebraic descendant of Aristotle’s syllogism).
By the 1970s, these ideas were core to the ‘symbolic’ school of AI, one of two main approaches to building thinking machines alongside the ‘connectionist’ branch that includes modern neural networks.
The symbolists used hand-written if–then rules (inspired by first-order logic) that operated on strings of symbols standing for real-world concepts. These constructs, many of which took the form of ‘expert systems’, were AI programs designed to mimic the decision-making of human specialists by employing specific rules like ‘if conditions A, B, and C are true, then conclude X.’
Stanford University’s medical programme MYCIN had around 450 rules encoding knowledge about infections. A simplified MYCIN rule looked a bit like: ‘IF the organism is Gram-positive coccus AND the infection is hospital-acquired OR the strain is known to be penicillin-resistant, THEN suspect Staphylococcus aureus.’
MYCIN could also explain its reasoning in plain English by tracing the rules it used, which is one of the reasons expert systems are remembered fondly by some AI researchers (in comparison to the famously opaque neural networks that dominate the research landscape today).
The underlying philosophy behind expert systems held that if we can explicitly tell a machine facts and rules it can logically deduce new conclusions like a human. Give the computer the right heuristics, and it will behave intelligently within a given domain.
Ways of knowing
The logical approach at the core of early AI lived in Aristotle’s shadow, but the man himself had a different view of how intelligence works. His position was closer to what later thinkers call empiricism, the belief that everything we know begins with our senses (though he still thought the mind needed to organise those raw impressions into general ideas).
We meet horses, smell pine, and feel heat. From encounters like these the mind actively grasps what is common and remembers these associations for the future. In this picture, the form of ‘horseness’ already exists in the world and intellect seeks it out.
A different view is ‘nativism’, which holds that knowledge is primarily innate. Plato, famous too for his role as Aristotle’s teacher, suggested that learning is a process of recollection where souls remember truths they knew before we were born. In modern terms, we can think of nativism as the idea that the brain comes pre-wired with certain concepts or ways of reasoning (an idea that runs through Cartesian philosophy all the way to Chomsky’s universal grammar).
I’m leaving much on the cutting room floor, but the broader point is that empiricists say knowledge is mainly a function of experience while nativists say what we know is mainly hard-coded. It’s a spectrum, really — few would deny any role of experience, and few would claim everything is innate — but it’s about where the emphasis lies.
Nonetheless, this question became the essential philosophical dividing line between symbolic AI’s expert systems and connectionism’s neural networks (and I might argue that it partly explains the wildly different views people have about the AI project today).
The symbolic approach behind expert systems is nativist in that knowledge is directly hard-wired into the machine. MYCIN didn’t deduce the principles of infectious disease by reading medical journals or analysing patient data on its own; the Stanford team fed it all the relevant rules they could gather from doctors.
Contrast this with the neural network tradition, which we can view as heir to empiricist conceptions of mind. Here, instead of loading the machine with explicit knowledge, we let our model learn from examples (cue klaxon).
Frank Rosenblatt’s perceptron, which we discussed in AI Histories #7, is a classic example of this approach. When Rosenblatt wanted it to recognise dots he didn’t state rules, he simply showed it enough examples until the network figured it out alone.
Each side had valid points. You can get symbolic systems to follow the chain of logic in a way that doesn’t easily violate known principles within their domain. But they are bad at adapting to new situations beyond their knowledge base, and acquiring the knowledge for each new domain is hard work.
Empiricist systems like neural networks are great at learning new stuff. They can ingest massive amounts of data and find structure on their own, often noticing subtle correlations humans haven’t encoded. This makes them very powerful for tasks like vision or speech, where we may not know the explicit rules to formulate reliable guesses.
But we still don’t fully understand some of the underlying processes that make connectionist systems tick, and they have a tendency to break in strange ways when faced with inputs that don’t correspond to patterns they are familiar with.
These weaknesses are in some ways mirror images. The symbolic approach lacks flexibility (a strength of empiricism), and the learning approach lacks clear causal structure (a strength of nativism).
It’s for this reason that people get very excited about ‘neurosymbolic AI’, which deals with building composite systems that pair neural networks with symbolic layered on top. I wrote about this idea a few weeks ago in the context of the ‘systemisation’ of large models, which aims to sand down some of their rough edges:
Systemisation is about making the core model a node within a bigger apparatus. We keep the language model in place, but surround it with specialist gadgets. Web search look-up, a code sandbox, a vision encoder, and a knowledge base. The model doesn’t need to have all the answers, it just needs to decide when and how to invoke the right tool.
In practice that looks less like grafting logic inside the network and more like giving it a bunch of rule-bound helpers. The neural network core supplies fluency, while the plug-ins give it the things pattern-matchers are bad at (e.g. exact recall, arithmetic certainty, and pulling out up-to-date facts).
In other words, we patch the brittleness of symbolic AI with learning, and the exotic failure modes of pure learning with explicit rules. It’s not the neat marriage of neurons and symbols that some theorists once imagined, but it is a workable settlement that is already yielding systems much more powerful than the sum of their parts.
Aristotle helped lay the foundations for both traditions. He formulated the syllogism that would eventually help define rule-based reasoning, while reminding us that knowledge begins with experience.
Nilsson opens his book with the Greek philosopher because he was a symbolic AI researcher who saw the syllogism in everything he did. He was writing in the mid 2000s, before deep learning emerged to blow past one wall after another.
If an AI researcher wrote the same type of book today, they might also be tempted to start with Aristotle. But the syllogism might have to take a back seat.