Computerphile
July 2, 2026
TL;DR
AI tokens—words or word pieces—are charged per input and output, making agentic coding agents extraordinarily expensive because they must re-read entire conversation contexts repeatedly, with costs scaling from thousands to millions of tokens for simple tasks.
“language models don't work that way. Everything goes in every single time.”
— Host
“It's like measuring my quality as a driver by how quickly I wear through my tires. That is an unsustainable practice.”
— Host
“When you have an incentive like that, of course people are going to ask really long-winded questions... and in a surprise to absolutely no one, that is completely unsustainable in terms of cost.”
— Host
1. What Is a Token?
Tokens are the basic units of AI processing—words, word fragments, punctuation, spaces, and special characters. They vary by language and tokenizer; Chinese characters may be 1–2 tokens, and modern models support ~100,000 tokens including code symbols and unicode characters.
2. How Models Generate Output
AI models work auto-regressively: they take an input context, make many decisions, and output one token at a time. Each new token generation requires the model to process the entire previous context again, making the process computationally expensive.
3. Context Window Growth and Cost Escalation
As conversations continue, the input context grows: initial query (100 tokens) + system prompt (1,000) + first thought (10,000) + follow-up query (200) leads to exponentially larger inputs. A simple follow-up triples the input cost.
4. KV Caching and System Optimization
KV caching stores intermediate network representations to avoid recalculating relationships between tokens. However, caches have short lifespans due to GPU memory constraints and user delays, requiring 're-filling' when users return after delays.
5. Chatbots vs. Coding Agents
Simple chatbots ask brief questions and receive brief responses (manageable token costs). Coding agents have autonomy, read files via tool calls, generate internal thoughts, and require all previous context in every query, causing token usage to explode.
6. Real-World Example: Bug-Fix Request
A simple bug-fix prompt (4,200 input tokens) triggers multiple file reads (~5,000 tokens each), internal thoughts (~2,000 tokens each), tool calls (~100 tokens), and code patches (~1,500 tokens), totaling ~55,000–60,000 tokens for one task.
7. GitHub Copilot Pricing Model Change
GitHub Copilot switched from flat-rate monthly subscriptions to per-token billing, revealing that simple agentic tasks consume millions of tokens. A six-prompt starfield code example used 2 million input tokens and 47,000 output tokens.
8. The Perverse Incentive Problem
Token-based billing encourages users to ask longer, more complex questions and use more expensive models with longer thinking times, creating a death spiral of cost escalation that is unsustainable for non-tech companies.
9. Sustainable Use Cases and Future Outlook
Efficient uses include small, succinct bug fixes, code completion (finishing half-written loops), and quick-fix scenarios requiring minimal context. Full agentic AI remains worryingly costly; companies must prove immediate ROI to justify expenses.