Fireship
June 17, 2026
TL;DR
A retrospective of 10 landmark computer science papers from Turing's 1936 work through OpenAI's GPT-3, tracing how foundational ideas in computation, information theory, neural networks, and scaling converged to create modern AI.
“Shannon wasn't trying to build artificial intelligence, but he gave us the math for uncertainty, prediction, and compression and accidentally wrote the spiritual ancestor to the loss function.”
— Narrator
“OpenAI takes the transformer and then asks the dumbest question possible. What if we just make it enormous?”
— Narrator
“Intelligence isn't some secret algorithm we're missing, but rather it simply emerges once you cross a threshold of scale.”
— Narrator
1. The Birth of Computing: Turing and the Halting Problem
Alan Turing's 1936 paper on computable numbers answered Hilbert's decision problem by proving not all mathematical problems can be solved algorithmically. In doing so, he invented the Turing machine—the abstract blueprint for all modern computers.
2. Information Theory: Shannon's Bits and Entropy
Claude Shannon's 1948 paper 'A Mathematical Theory of Communication' reduced all human communication to ones and zeros, introducing the bit as a unit of information and entropy as a measure of uncertainty. This framework later became the spiritual ancestor to modern AI loss functions.
3. Neural Networks Emerge: The Perceptron and First AI Winter
The perceptron (1958) inspired by biological neurons introduced the first machine learning algorithm. However, a 1969 proof by MIT researchers showed single-layer perceptrons couldn't learn exclusive-or, killing AI funding for years despite discovering that stacking layers solves the problem.
4. Distributed Systems: Lamport's Logical Clocks
Leslie Lamport's paper on distributed systems solved the synchronization problem for multiple computers without shared clocks using causality-based ordering. This became essential infrastructure for coordinating thousands of GPUs in modern AI training.
5. Backpropagation: Training Deep Networks
After 17 years of AI winter, researchers including Geoffrey Hinton discovered backpropagation—running data forward, measuring error, and pushing it backward through layers using calculus to adjust weights. This revealed that hidden layers automatically learn features like edges and shapes.
6. Web Scale Data: PageRank and Google
Larry Page and Sergey Brin's 1998 PageRank algorithm ranked web pages by link votes weighted by voter trustworthiness. Google's resulting web index created the largest structured corpus of human text ever assembled, which became training data for future AI models.
7. Deep Learning Breakthrough: AlexNet and ImageNet
In 2012, Alex Krizhevsky trained a deep convolutional neural network on ImageNet (millions of labeled photos) using consumer GPUs. AlexNet dropped image classification error by 10 points in a single year, proving deep learning works at scale with the right data and compute.
8. The Transformer Revolution: Attention Is All You Need
The 2017 'Attention Is All You Need' paper introduced the Transformer architecture, replacing sequential token processing with self-attention that lets every word attend to every other word simultaneously. This solved long-range dependency problems and became the foundation for all modern LLMs including GPT.
9. Scaling Laws: GPT-3 and Emergent Intelligence
OpenAI's 2020 'Language Models are Few-Shot Learners' paper scaled the Transformer to 175 billion parameters on internet-scale data. GPT-3 demonstrated that intelligence emerges at sufficient scale, enabling zero-shot translation, summarization, and code generation without task-specific training.
10. The AI Era: From Theory to Trillion-Dollar Products
The evolution from GPT-3 to ChatGPT showed how scaling insights evolved into trillion-dollar products. Modern AI fundamentally performs the same next-token prediction Shannon described in 1948, but on an incomprehensibly larger scale.