Google launches Gemini 3.1 Flash Lite at one-eighth the cost of its Pro model

Google's new Gemini 3.1 Flash Lite AI model offers enterprise-grade performance at one-eighth the cost of its Pro version, with 2.5x faster response times for high-volume tasks.

Mar 3, 2026
5 min read
Set Technobezz as preferred source in Google News
Technobezz
Google launches Gemini 3.1 Flash Lite at one-eighth the cost of its Pro model

Don't Miss the Good Stuff

Get tech news that matters delivered weekly. Join 50,000+ readers.

Google's newest AI model costs one-eighth of its flagship Pro version while delivering 2.5 times faster response times for high-volume enterprise workloads. The Gemini 3.1 Flash-Lite launched today at $0.25 per million input tokens and $1.50 per million output tokens, positioning it as the company's most cost-efficient model in the Gemini 3 series.

Built specifically for intelligence at scale, Flash-Lite targets the millions of daily tasks that require consistent, repeatable results without massive compute overhead. It handles translation, tagging, and moderation workloads where latency matters more than deep reasoning capabilities.

The model is rolling out in preview to developers via Google AI Studio and Vertex AI.

Performance benchmarks show Flash-Lite outperforms its predecessor with a 2.5 times faster time to first answer token and a 45 percent increase in output speed. According to internal testing, it achieves 363 tokens per second compared to Gemini 2.5 Flash's 249 tokens per second.

This low latency makes it suitable for real-time customer support, live content moderation, and instant user interface generation where response time determines user experience. The model scored an Elo rating of 1432 on the Arena.ai Leaderboard, placing it competitively against larger systems despite its "Lite" designation.

Key benchmark results include 86.9 percent on GPQA Diamond for scientific knowledge, 76.8 percent on MMMU-Pro for multimodal understanding, and 88.9 percent on MMMLU for multilingual question answering.

Flash-Lite introduces thinking levels that allow developers to modulate reasoning intensity dynamically. For simple classification or high-volume sentiment analysis, the model can operate at maximum speed with minimum cost. For complex code exploration or dashboard generation, thinking can be dialed up for deeper reasoning before emitting responses.

Compared to competitors like Claude 4.5 Haiku priced at $1.00 per million input and $5.00 per million output tokens, Flash-Lite offers cost advantages. Even against Gemini 2.5 Flash at $0.30 per million input tokens, the new model provides both performance gains and cost reduction.

Enterprise technical decision-makers can now implement a cascading architecture using Gemini 3.1 Pro for complex planning and architectural design, then hand off high-frequency execution to Flash-Lite at one-eighth the cost. In high-context usage above 200,000 tokens per interaction, Flash-Lite becomes between 12 and 16 times cheaper than the Pro variant.

"Early feedback from Google's partner network highlights practical applications across industries. Kolby Nottingham, Head of AI at Latitude, reported a 20 percent higher success rate and 60 percent faster inference times compared to previous models."

"Bianca Rangecroft, CEO of Whering, achieved 100 percent consistency in item tagging by integrating Flash-Lite into their classification pipeline."

The release completes Google's tiered strategy launched weeks earlier with Gemini 3.1 Pro in February 2026. While Pro models handle deep research and high-stakes synthesis with capabilities like vibe-coding animated SVGs from text prompts, Flash-Lite serves as the workhorse for scalable execution.

Developers building via the Gemini API receive a direct performance upgrade at the same or lower price points with this release.

Share this article

Help others discover this content