An introduction to AI history
An overview of how we got here, what came before, and the key players, ideas, and trends in the history of AI
This post is the first word on the history of AI at Learning From Examples, but certainly not the last. I start by introducing core concepts and some of the problems with the historiography of AI as it exists today. Once we’re on the same page, I sketch what is a broad (and necessarily shallow) history of the field. You can think of this overview as a jumping off point for future work that pulls some of the threads I bundle together here.
Received wisdom has it that the origins of the discipline that we today call ‘artificial intelligence’ can be traced back to the summer of 1956, when a group met at Dartmouth College in Hanover, New Hampshire. The gathering, which took place over the course of two months, brought together researchers including John McCarthy (workshop organiser and developer of the LISP programming language) Marvin Minsky (a pivotal player in the intellectual battle between AI’s two most prominent subfields), and Claude Shannon (a major figure in information theory, and possible inspiration for the name of Anthropic’s flagship language model). The group explained their goals in a 1955 proposal:
We propose that a 2-month, 10-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.
Of course, ‘every aspect of learning or any other feature of intelligence’ was not and has not been simulated by machines. To understand why, consider the following phrases: ‘artificial intelligence’, ‘machine learning’, and ‘artificial neural network’. Histories of artificial intelligence, many of which are written by practitioners, generally consider each a branch of the same tree in which the artificial neural network sits neatly under machine learning, which is in turn a subset of artificial intelligence.
But that isn’t the case. Machine learning, and subsequently artificial neural networks, have their own distinct conceptual and technical roots. That is why today you occasionally see the field described as artificial intelligence and machine learning or AI/ML. When, in 1956, McCarthy and his contemporaries introduced ‘artificial intelligence’ into the academic lexicon, they were primarily talking about the symbolic school of AI in which hard-coded knowledge is instilled into a computer system. I will unpack what this means, but for now all you need to understand is that symbolic reasoning exists in parallel to a style of ‘AI’ known as connectionism. This field is inextricably linked with machine learning and is the conceptual forebear to the vast majority of modern AI. As we shall see, its roots go further back than 1956.
This isn’t just semantics. Because modern AI is connectionism, and because connectionism did not spring up out of the ground in the fields of New Hampshire, we have to concede that, no, AI did not in fact emerge in 1956. Of course, the Dartmouth group popularised the term and influenced the trajectory of the technologies that would eventually become today’s large models, but more on that in a moment. The reason, then, that all features of intelligence have not been precisely ‘described’ by the creators of today’s AI systems (which, crucially, is not the same as their representation within a system) is straightforwardly obvious: they stopped trying.
Winter is (not) coming
Let’s take a step back and untangle some of the above. As I mentioned, researchers have taken two different approaches to building intelligent machines. The first, largely the focus of the attendees of the workshop, is the symbolic approach in which systems are developed using hard-coded rules based on the manipulation of expressions. Heavily influenced by the ‘physical symbol systems hypothesis’ developed by American AI researchers Allen Newell and Herbert Simon, symbolic reasoning assumes that aspects of ‘intelligence’ can be replicated through the manipulation of symbols, which in this context refers to representations of logic that a human operator can access and understand. Influential histories focused on this group include Pamela McCorduck’s Machines Who Think: A Personal Inquiry into the History and Prospects of Artificial Intelligence (1979) and Donald Crevier’s AI: The Tumultuous History of the Search for Artificial Intelligence (1993).
The second is the connectionist or machine learning ‘branch’ of artificial intelligence, which proposes that systems ought to mirror the interaction between neurons of the brain in order to independently learn from data. Artificial neural networks are commonly placed within this group. The historian Aaron Mendon-Plasek convincingly argues that what we know today as machine learning developed from techniques and practices of pattern recognition, which saw the field evolve independently from artificial intelligence in the twentieth century. He suggests that pattern recognition realised a form of learning by borrowing the notion of the loss function from the statistical decision theory of the mathematician Abraham Wald, whose definition of ‘loss’ described the concept as the arithmetic sum of the loss due to the decision made and the cost of the observations.

Popular histories of AI tend to follow a seasonal account with ‘summers’ in which research blossoms and ‘winters’ in which research fades. While there have been instances in which one school thrived while the other struggled, so too have there been overlaps between the cold spells experienced by each. I discuss some of these ups and downs below, but there are generally thought to have been two prominent AI winters. First, the winter that began in 1973 following the publication of the Lighthill report from the UK Science Research Council, which concluded that symbolic-focused approaches had run into a wall. Though originally showing promise, these rules-based systems struggled to adapt to ambiguous situations, lacked scalability, and had difficulty representing common sense knowledge.
Separately, in 1969, MIT researchers Seymour Papert and Marvin Minsky wrote Perceptrons: An Introduction to Computational Geometry. The book is generally considered to have halted the progress of connectionism for the next decade (though this narrative doesn’t fully hold for reasons that should become clear). The second winter is generally said to have appeared at the end of the 1980s, when connectionist systems (which returned to prominence in the intervening years) ran out of steam. Later, it would become clear that the problem was for the most part one of scale. As Rich Sutton’s Bitter Lesson reminds us: “The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin.”
I have some problems with seasonal accounts, though I should make it clear that I do believe there was a downturn from the 1990s to the early 2010s. Whatever the case, the most effective arguments against the concept tend to point out that many breakthroughs occurred in the middle of periods that have been described as AI winters. Consider backpropagation, the automatic adjustment of weights in an artificial neural network (and one of the most important developments in the field). In 1974, Harvard PhD student Paul Werbos proposed backpropagating errors to train multilayer artificial neural networks, while a version of the algorithm was was also described by Finnish mathematician Seppo Linnainmaa in 1970. In 1969, Arthur E. Bryson Jr used a version of backpropagation when he introduced iterative gradient methods for solving Euler–Lagrange equations. Each of these came long before similar work by psychologist David Rumelhart and computer scientist Geoff Hinton in 1986, often referred to as to as pivotal moment in connectionist lore. Underscoring the long history of the technique, Yann LeCun wrote in a paper edited by Hinton in 1988 that backpropagation “had been used in the field of optimal control long before its application to connectionist systems.” Optimal control theory is, for those interested, a form of mathematical optimisation concerned with finding a control for a dynamical system (rather than a static system) over a period of time for the purpose of optimising an objective function. Sutton also identifies optical control theory as the precursor to what would become reinforcement learning.
Back to the start
So, as you can see, the history (and historiography) of AI is a bit of a mess. But amidst the disorder, an important question remains. If 1956 shouldn’t be considered the start of the field, then when should be?
Before I offer my own answer, I am going to start by providing a few different options. We could begin with the conceptual origins of AI: Talos (a bronze automaton crafted by Hephaestus in ancient Greece), the golem (humanoid creatures from Hebrew mythos crafted from clay) or the bhuta vahana yanta (mechanical guards from Indian history). There's the automaton built by Chinese engineer Yan Shi, the 'Book of Ingenious Devices' detailing various automata compiled in the 9th century by the Banū Mūsā brothers, and plenty of others beside.
Alternatively, we could start in the 1820s with Charles Babbage’s Difference Engine, a mechanical calculator intended to automate the production of mathematical tables used for maritime navigation. At the time, such tables were created manually, a process fraught with potential for human error. Babbage, an English businessman who inherited the equivalent of millions of pounds in today’s money from his father, spent over a decade trying to construct the machine but never managed to fully complete it. The historian Simon Schaffer argues that Babbage was able to describe the possibility of the engines as having ‘volition and thought’ only by overlooking the role of machine designers and operators whose input would be essential to their reliable functioning. This dynamic, of obscuring the contingencies on which AI systems are based to make rhetorical claims about their performance, is unfortunately all too common in the history of AI. Nonetheless, as the project stalled, Babbage turned to the production of a more ambitious device known as the Analytical Engine. The design included features that anticipated modern computers: versions of a central processing unit, memory, and even output devices like a printer and plotter. Ada Lovelace, daughter of British poet Lord Byron, wrote a 1843 article describing the steps the engine would take in solving certain mathematical problems - that is, programs - which were the first published descriptions of their kind.
Closer to home is the 20th century. In the 1940s, for example, mathematician and philosopher Norbert Wiener led the development of cybernetics, which studies systems of control and communication in machines and living beings. Cybernetics is a transdisciplinary field exploring regulatory systems, their structures, constraints, and possibilities. The term itself derives from the Greek word for ‘steersman’ (a person who steers a boat or ship) and is broadly concerned with systems that interact with themselves and produce their own behaviour. The core concept of cybernetics is the feedback loop, a mechanism whereby current performance influences future performance in a given system. For example, a home thermostat is a simple cybernetic system: it monitors temperature and adjusts the heating to maintain a desired temperature over time.
In 1948, Claude Shannon developed information theory, which introduced the bit as a standard unit of information. The concept was originally introduced to find fundamental limits on signal processing operations, especially with respect to data compression and storage. Given its central importance to the modern digital world - and wide-ranging applications across computer science - the development of information theory represents a possible entry point to our story.
Of course, no reflection - no matter how brief - on the history of computing is complete without a mention of Hungarian-American polymath John von Neumann. In 1945, von Neumann wrote a report outlining the architecture for a stored-program computer. This design, which would later be known as the von Neumann architecture, consisted of a central processing unit, memory, and devices for both inputting and outputting information. It was, in short, the blueprint for the modern computer. In the 1940s and 1950s, von Neumann began developing the principles of self-replicating machines, what he called ‘automata’. This concept is perhaps best known in the form of the von Neumann probe, a hypothetical self-replicating spacecraft that would travel to a distant solar system, harvest materials from nearby asteroids, and build copies of itself before repeating the process independently and indefinitely.
The 1940s also saw the development of electronic digital computers like the ENIAC, which I wrote about during my history of the National Institute of Standards and Technology. (I will, though, write something on the past and present of compute and computing in the not too distant future.) These machines weren't designed for AI specifically, but they set the stage by increasing computational capabilities and ultimately widening the option space in which AI development could take place.
Then there is Alan Turing, a colossus of computing history. Turing's most significant contributions to our story come from theoretical work in the 1950s in which he proposed the concept of a ‘universal machine’ that could perform the functions of any other. This idea paved the way for the modern general-purpose computer. In 1950, Turing published his paper ‘Computing Machinery and Intelligence,’ where he proposed his infamous imitation game (often referred to as the ‘Turing Test’). The paper proposed that, instead of trying to mimic the adult mind, AI research should strive to simulate the mind of a child, which has “rather little mechanism, and lots of blank sheets.”
Regardless of where we draw the line that divides the history of AI from its prehistory (which has to be somewhere, lest the whole history of humanity be the history of AI), we can calibrate our own views based on the constellation of moments that together form the AI canon. Yann LeCun wrote in 1987 about some of these episodes:
“At the very beginnings of Artificial Intelligence (AI), one could find two main goals, which originally seemed to be considered as two aspects of only one question: design ‘intelligent’ machines and understand human intelligence. J. von Neumann wanted a theory to establish the logical differences -if any- between ‘natural’ and ‘artificial’ automata. McCulloch and Pitts had shown how neurons, similar to the nervous cells, could be used to achieve the same computational performances as a Turing machine. The question of how to make them learn to compute was then raised by Rosenblatt, who tried to build a learning machine, the ‘perceptron’, based on such components.”
I will get to a survey of these moments and others, but before that, consider the twin goals of designing intelligent machines and understanding human intelligence. These objectives (or rather, their mutually reinforcing nature) were one of the central reasons that interest in connectionism has remained buoyant for the better part of almost one-hundred years. An illustrative example can be found in the 1980s. Connectionism was thought to be representative of radical behaviourism, a popular theory developed by American psychologist B.F. Skinner that posited that human will and agency could be understood as products of external stimuli. Seymour Papert, who wrote Perceptrons with Minsky, suggested that the “behaviorist process of external association of stimuli with reinforcements” generated a “cultural resonance” between behaviourist interpretations of mind and connectionism. The ‘evidence’ provided by neural networks for behaviourist understandings of mind proved to be a self-reinforcing phenomenon; artificial neural networks ‘confirmed’ the behaviourist perspective, which in turn, drove interest in connectionist models that benefitted from suggestions that they represented human cognition.
It’s clear, then, that AI history is not just about hardware, data or algorithms. Its formation as an independent discipline was predicated on, and conditioned by, a collection of distinct fields, ideas, and values. I try to weave both types of contingencies (AI-specific technical breakthroughs and influences from farther afield) into a simplified narrative below. And while I am not going to get into each of these in any detail in this post, consider this a promise that I will return to them in future editions of Learning From Examples.
A simple history of AI
As you can see, there are lots of places we could begin. But that’s not really a satisfying answer, even if it does draw into the focus the messy, complex nature of this thing we call the history of AI. So, to nail my own colours to the mast, I am going to start in the 1930s with the creation of abstract mathematical models used to represent nerve cells. This reason for this is simple: this moment, while not quite the starting point for connectionism, directly laid the groundwork for the field. Connectionism eventually gave way to the deep learning era, which ultimately led to the large models that we refer to as AI today. If we want to continue calling Bard, Claude, and ChatGPT ‘AI’ (and we ought to, I think) this is a good place to start. The 1930s also saw the birth of operations research, whose aim was to “to examine quantitatively whether the user organisation is getting from the operation of its equipment the best attainable contribution to its overall objective.” Operations research would eventually become optimal control theory, which together with pattern recognition (a field that also evolved in this period) represented two essential influences on the growth of connectionism. For the sake of time, however, I will focus on the collision of mathematics and biology.
We start with Nicolas Rashevsky, a Russian-American biologist who was instrumental in creating the field of mathematical biology. Rashevsky, who later mentored symbolic AI pioneer Herbert Simon, helped to establish a mathematical foundation for understanding biological systems such as neurons. Throughout the 1930s, he developed equations to articulate the processes by which neurons interact with each other at the University of Chicago, describing the functioning of nerve cells in the language of mathematics. While even the largest contemporary models resemble the structure of biological neural networks in only a very loose manner, we should not underestimate the power of these comparisons - whether wholly accurate or not - as a source of inspiration for connectionist researchers. Representing the functioning of nerve cells in mathematical terms was a watershed moment: if real neurons could be described through equations, it followed that versions of their biological processes might be represented in artificial structures.
That moment came in 1943 with the introduction of the McCulloch-Pitts neuron, which the pair developed by directly building on Rashevsky’s work. In their paper, ‘A Logical Calculus of the Ideas Immanent in Nervous Activity,’ Warren McCulloch and Walter Pitts created a mathematical model that acted as a simplified abstraction of a biological neuron. It took a set of binary inputs, multiplied them by corresponding weights, summed them, and then passed them through a step function that outputted a binary value. If the sum exceeded a certain threshold, the neuron ‘fired’ (by outputting a 1) or remained inactive (by outputting a 0).
The original McCulloch-Pitts model did not include a ‘learning mechanism’ (that is, adjustable weights that could result in different outputs). It would take until 1958 for Cornell University’s Frank Rosenblatt to introduce the ‘perceptron’, which was one of the first algorithms that demonstrated the principle of dynamic change in an artificial neural network. Like the McCulloch-Pitts neuron, the perceptron took a set of binary inputs, multiplied them by the corresponding weights, and produced a binary output. Unlike the McCulloch-Pitts neuron, however, the perceptron was designed to automatically find the optimal weights to classify its inputs correctly. This was achieved through a simple rule: if the perceptron made a mistake on an input, it would adjust its weights in the direction that would make it more likely to classify that input correctly in the future. The perceptron, which was originally funded to help classify images for U.S. military intelligence, caused a wave of sensationalist reporting.
The Navy last week demonstrated the embryo of an electronic computer named the Perceptron which, when completed in about a year, is expected to be the first non-living mechanism able to "perceive, recognize and identify its surroundings without human training or control."
The New York Times, July 13, 1958
In the years following the announcement, researchers moved to test the claims. As I mentioned, in 1969 Minksy and his longtime friend and collaborator Seymour Papert published Perceptrons: An Introduction to Computational Geometry. The core idea behind the book’s critique was that perceptrons could only solve linearly separable problems. This means that if you can't draw a straight line (or, in higher dimensions, a plane) to separate the different classes of data, a single-layer perceptron can't solve the problem. They made the case that systems could not be layered to solve the linearity issue, which would eventually be shown to be inaccurate (though not before severely damping interest in connectionist approaches).
Though he had worked on ‘learning machines’ of his own, such as the stochastic neural analogue reinforcement calculator (SNARC) built in the 1950s, by the time Perceptrons was published Minsky was advocating for systems representative of the rules-based symbolic approach. In a 1968 article in the New York Times, Minsky outlined a proof of concept for a machine designed to support construction engineers. Fundamental to the success of such machines, the article explained, was the ability for a system to ‘know’ the intrinsic qualities of objects—a feature heavily associated with symbolic systems. As an aside, Minsky was the primary technical consultant for the computer system HAL in Stanley Kubrick’s 2001: A Space Odyssey. Reflecting on his advice, he stated that HAL - the Heuristically Programmed Algorithmic Computer - was “supposed to have the best of both worlds” in reference to the use of case-based reasoning and heuristics representative of symbolic AI.
While my story is primarily focused on the development and progression of connectionism, we should remember that rules-based based systems eclipsed their connectionist counterparts in popularity for much of the 20th century. I will not spend too long on them (this article is long enough as it is), but there are a few important moments that we ought to consider related to the discipline now referred to as ‘good old-fashioned AI’ or GOFAI. In 1959, for example, John McCarthy proposed a program that could accept high-level instructions or ‘advice’ in a human-readable form, while in 1976 Herbert Simon and Allan Newell argued that any system capable of manipulating symbols (such as a computer) can exhibit intelligent behaviour similar to that of a human. There was also the heyday of the ‘expert systems’ in the 1980s, which found some use in commercial settings to, for example, configure computer orders and improve their accuracy.
The 1980s, however, are primarily remembered for a resurgence in connectionism. Bruce MacDonald, a computer scientist writing in 1987, argued that dissatisfaction with symbolic manipulation and the emergence of hardware with greater processing power able to satisfy a new generation of computationally intensive neural networks underpinned the renewal. There was, of course, also the matter of the popularisation of backpropagation by David Rumelhart and Geoff Hinton in 1986 and the Hopfield network developed by Bell Labs computer scientist John Hopfield in 1982. Hopfield was widely regarded by colleagues as responsible for an increased interest in connectionist methods, with researcher Tom Schwartz telling Popular Science in 1989 that ‘Hopfield should be known as the fellow who brought neural nets back from the dead.’ As above, though, versions of the backpropagation algorithm had already been developed elsewhere. Similarly, the type of network described by Hopfield in 1982 had previously been introduced by Stanford researcher William Little in 1976 and by Japanese researcher Shun'ichi Amari in 1972. Despite this, there is little doubt that the 1980s proved the potential of artificial neural networks through commercial applications. A team at Bell Labs, for example, used backpropagation to develop systems capable of successfully recognising handwriting on postal envelopes. In doing so, the group made major steps forward in the design of convolutional neural networks and time-delay neural networks, while introducing techniques such as weight pruning (then called ‘optimal brain damage’) to significantly boost performance.
This is where my history of AI ends. There’s much, much more to say – from the development of reinforcement learning beginning with optimal control theory and energised by research from Sutton and Barto – to the dawn of the deep learning era. This post doesn’t have enough space to discuss the 2010s, so I will have to leave out discussion of AlexNet, AlphaGo, and the development of the transformer architecture that underpins our current era of very large models. You can, however, read a good technical history covering this period by Juergen Schmidhuber if you are so interested.
Where do we go from here?
It should be obvious to you, I hope, that I do not believe that the origins of modern AI can be neatly traced back to a single moment in time, and certainly not to the summer of 1956. Ultimately, the history of AI isn’t just about algorithms and architectures. Nor is it solely about data or processing power. The story of AI is about the disciplines and influences that allowed technical breakthroughs to happen as much as it is about those technical milestones and the people who made them. That isn’t to downplay the role of researchers, but merely to recognise that progress does not happen in a vacuum.
There are of course relationships and personalities to account for, not to mention the long shadow cast by corporate, academic and military institutions whose influence shaped the direction of research for much of AI’s history. I have only hinted at the fields to which AI owes so much: statistics, pattern recognition, cybernetics, optimal control theory, operations research, economics, behavioural psychology and of course neuroscience. The links between these areas and AI are some of the topics I’ll be looking at in the history-focused editions of Learning From Examples. I hope you’ll join me for the ride.
Great post. Minor correction regarding "Turing's most significant contributions to our story come from theoretical work in the 1950s"--he proposed the concept of the universal computer in 1936 and passed away in 1954.
Hi, your link to the history of NIST takes me to a private page. Is this by accident?