Lex Fridman
March 23, 2026
1. Extreme Co-Design: The New Paradigm
Jensen explains why distributed computing at massive scale requires co-designing GPU, CPU, memory, networking, switching, power, and cooling together. He describes his management structure—60+ direct reports, mostly engineers—who collaborate on problems simultaneously rather than through one-on-ones, embodying extreme co-design as a company philosophy.
2. CUDA: The Existential Bet That Changed Everything
Jensen recounts how NVIDIA made the critical 2007 decision to put CUDA on consumer GeForce GPUs despite adding 50% to costs and crushing gross margins. This gamble created an install base that enabled developers to discover CUDA. Market cap fell from $8B to $1.5B, but the company clawed back over a decade because GeForce took CUDA to researchers, students, and gamers who became pioneers in deep learning.
3. Shaping Belief Systems: Leadership Through Conviction
Jensen describes his approach to strategic decisions: he develops deep conviction through reasoning, then gradually shapes the belief systems of employees, board, partners, and customers through consistent communication over months or years. By announcement day, everyone says 'what took you so long?' This method applies to internal strategy (deep learning pivot), acquisitions (Mellanox), and industry-wide decisions (HBM adoption).
4. Four Scaling Laws: Beyond Pre-Training
NVIDIA identified four scaling laws powering AI: pre-training (data scale), post-training (synthetic data generation), test-time (inference/reasoning at compute cost), and agentic (spawning sub-agents for team scaling). These form a cycle where agentic systems generate data fed back to pre-training, enabling continuous scaling limited primarily by compute availability.
5. Architecture Flexibility vs. Specialization
CUDA balances specialization (GPU acceleration) with generalization (adaptability to changing algorithms). Jensen discusses how mixture-of-experts required NVLink 72 instead of NVLink 8, and how Grace Blackwell racks redesigned for LLM inference evolved into Vera Rubin racks optimized for agentic systems with storage accelerators and new Vera CPU—all anticipated through first-principles reasoning about digital worker requirements.
6. Power, Grid Efficiency, and Supply Chain
Power consumption is a blocker, but Jensen emphasizes improving tokens-per-second-per-watt through extreme co-design—achieving 1 million× compute improvement in 10 years vs. 100× from Moore's Law. He proposes using excess grid capacity (typically at 60% peak, only 99% used during extreme weather) by designing gracefully degradable data centers that shift workloads rather than demanding 100% uptime.
7. Learning from Elon's Systems Engineering Philosophy
Jensen praises Elon's approach to building Colossus supercomputer in four months: questioning necessity, eliminating waste, being present at point of action, and creating urgency. Jensen contrasts this with continuous-improvement thinking, advocating instead for engineering from first principles at 'speed of light' limits before optimization.
8. TSMC: Trust, Technology, and Manufacturing Miracles
Jensen credits TSMC's success to balancing technology excellence with customer service obsession, creating an intangible called trust. Three decades and hundreds of billions in business with no contract. He declines Morris Chang's CEO offer because NVIDIA's mission is equally important and requires his full dedication.
9. CUDA as Moat: Install Base, Ecosystem, and Velocity
NVIDIA's core advantage is the CUDA install base of millions of developers who trust continuous improvement, reach hundreds of millions of devices across clouds and industries, and target CUDA first in open-source projects. Combined with horizontal ecosystem integration (Google Cloud, Azure, AWS, edge, cars, robots, satellites) and velocity of annual system redesigns, this creates a defensible moat.
10. From Warehouses to Factories: Computing's Fundamental Shift
Computers evolved from retrieval-based (pre-recorded files) to generative-based (real-time contextual token generation), requiring orders of magnitude more compute. This transforms computing from a low-margin warehouse (storage) to a high-margin factory (generation). Intelligence becomes a segmented, scalable product with premium tokens, driving GDP acceleration and increasing compute's share of economic value.