Google IO 2026: Gemini Omni, Spark Agents, and the Shift to Agentic AI

TechGoogle’s New Omni And Spark Just Changed AI Forever

Key Takeaways

1Google's token processing has grown 7x year-over-year to 3.2 quadrillion tokens per month, with Gemini app users exceeding 900 million and AI search features reaching billions of monthly active users
2Gemini 3.5 Flash delivers frontier-level AI performance at 4x the speed and less than half the price of competing models like GPT-4.5 and Claude Opus, saving large enterprises over $1 billion annually on workload shifts
3Gemini Omni is a true multimodal world model that generates realistic, physics-accurate video from text, audio, images, and video inputs simultaneously, with editable outputs and SynthID watermarking for authenticity verification
4Gemini Spark is a persistent AI agent running 24/7 that handles long-horizon tasks across 30+ integrated tools, automating calendar management, email composition, and background research
5Google's infrastructure investment has increased 6x to $180-190 billion annually, with new TPU8 chips enabling training across over 1 million TPUs globally and processing 3+ trillion tokens daily internally
6Anti-gravity 2.0 evolved from a coding environment into a full agentic platform with a 12x speed optimization over frontier models, featuring managed agents and WebMCP for browser-based AI agent execution
7New consumer-facing features like Docs Live, Ask YouTube, intelligent eyewear, and generative UI in search bring agentic capabilities directly to users across Google's ecosystem

Chapters

1. Massive Scale and Adoption Numbers

Google revealed staggering growth metrics: token processing jumped from 9.7 trillion two years ago to 3.2 quadrillion per month. Gemini app users more than doubled to 900 million in one year, with daily requests increasing 7x. AI Overviews and AI search mode now have 2.5 billion and 1 billion monthly active users respectively.

2. Gemini 3.5 Flash: Speed and Intelligence at Half the Price

Gemini 3.5 Flash outperforms flagship models from OpenAI and Anthropic on multiple benchmarks while operating at 4x their output speed (280 tokens/second vs. 60-70). Sundar Pichai emphasized pricing under half competitors', with potential for enterprises to save over $1 billion annually by shifting 80% of workloads to Flash.

3. Gemini Omni: The World Model Revolution

Omni is a true multimodal generative model trained simultaneously on text, audio, images, and video. It produces scientifically accurate, physics-coherent content with proper synchronization. Users can iteratively edit videos through natural language, maintaining character consistency and scene coherence. All outputs include imperceptible SynthID watermarks.

4. Infrastructure: TPU8 and Global Training at Scale

Google introduced eighth-generation TPUs with specialized dual-chip design: TPU8T for training (3x prior generation power) and TPU8 for inference (optimized latency). Training is now distributed across 1+ million TPUs globally via Jax and Pathways. Google's annual capex has increased 6x from $31 billion in 2022 to $180-190 billion.

5. Anti-gravity 2.0: From Code Editor to Agentic Platform

Anti-gravity evolved into a complete autonomous agent development and management platform with a standalone desktop app. It features a 12x speed-optimized Flash variant and processed 3+ trillion tokens daily in March, doubling every few weeks. Includes managed agents API, custom SDKs, and Firebase integration.

6. Developer Tools and Android/Web Integration

Google AI Studio now supports Kotlin for Android development with one-click Cloud Run deployment. Android agents can interact directly with Android Studio; open-sourced Android skills help models execute best practices. WebMCP is a new standard allowing browser-based AI agents to execute JavaScript functions and HTML forms reliably.

7. Gemini Spark: Persistent Personal AI Agent

Spark is a 24/7 agent powered by Gemini 3.5 running on dedicated Google Cloud VMs. It integrates with 30+ tools including Adobe, Dropbox, and Uber. Accessible via Gemini app, email, and chat; operates in Chrome later this summer. Handles email drafting, calendar management, document retrieval, and background follow-ups.

8. Consumer-Facing Agentic Features in Search and Google Apps

Information agents run 24/7 in Search to proactively find and recommend content. Generative UI creates dynamic custom layouts for search results. Ask YouTube jumps to the most relevant video segment. Docs Live enables voice-based document creation and editing. Maps gets natural conversation support. Daily Brief synthesizes inbox, calendar, and tasks into prioritized digests.

9. Creative Tools and Accessibility: Pix, Flow, and Eyewear

Google Pix is a new AI image editor treating elements as individual objects for fine control. Google Flow enables creative tool integration and code generation. Audio glasses (Gentle Monster and Warby Parker partnership) launch this fall with voice-controlled Gemini access, real-time translation, and hands-free calling. Display glasses showing visual information coming later.

10. Safety, Watermarking, and Cross-Industry Standards

SynthID watermarks all Omni-generated videos imperceptibly and is verified across Gemini, Chrome, and Search. SynthID has watermarked 100+ billion images and videos plus 60,000 years of audio. OpenAI, Cacao, and 11 Labs have adopted SynthID. Voice cloning is cautiously implemented; deep fake prevention remains a priority.

Glossary

Tokens: Basic units of text that AI models process; a measure of computational workload and data throughput
Frontier Models: State-of-the-art AI models representing the cutting edge of machine learning capability
Multimodal: AI systems that can process and generate multiple types of data simultaneously (text, audio, images, video)
World Model: An AI system that understands and accurately represents relationships and physics across different data types
Agentic AI: AI systems capable of autonomous planning, reasoning, and taking actions across extended periods without constant user instruction
SynthID: Google's imperceptible watermarking technology for AI-generated content that verifies authenticity
TPU (Tensor Processing Unit): Google's custom hardware chips optimized for machine learning training and inference
MCP (Model Context Protocol): An open standard allowing AI agents to interact with external tools and services
WebMCP: An extension of MCP for web environments, enabling browser-based AI agents to execute JavaScript functions and DOM interactions
Generative UI: Dynamic user interfaces created in real-time by AI based on user queries and context

Explore