Google Rolls Out Gemini 3.1 Pro Upgrade With Strong Reasoning Gains

TL;DR

Gemini 3.1 Pro achieves 77.1% on ARC-AGI-2 logic testing.
Model keeps a 1M token context and expands output to 65k tokens.
New custom tools endpoint improves file actions and coding agents.
Preview rolls out across Gemini app, Vertex AI, and developer tools.

Google has released Gemini 3.1 Pro, an updated model designed to improve complex reasoning, planning, and tool use across consumer and enterprise services. The company said the model more than doubles the ARC-AGI-2 score achieved by Gemini 3 Pro, delivering stronger performance in areas that require problem solving rather than simple text generation. The update is now rolling out in the Gemini app, Vertex AI, NotebookLM, and through developer tools.

Gemini 3.1 Pro reached a verified 77.1% on the ARC-AGI-2 benchmark. The benchmark measures a model’s ability to reason through new logic patterns not contained in training data. Google said the improvement supports agent-driven workloads, which depend on stable long-form reasoning across many steps in a task.

Today, we’re continuing to push the boundaries of AI with our release of Gemini 3.1 Pro.

This updated model scores 77.1% on ARC-AGI-2, more than double the reasoning performance of its predecessor, Gemini 3 Pro.

Check out the visible improvement in this side-by-side comparison,… pic.twitter.com/8fbHFLeyL8

— Jeff Dean (@JeffDean) February 19, 2026

The release follows last week’s Gemini 3 Deep Think update, which targeted scientific and engineering use cases. Google said the new model builds on that work while offering wider access for developers and enterprise users.

Gemini 3.1 Pro Expanded Context Window and Output Capacity

Gemini 3.1 Pro supports a one million token input context window. This allows users to load full code repositories, research datasets, or long documents into a single request. Google said the model can maintain stable reasoning across files and data segments when the content spans hundreds of thousands of tokens.

The model also introduces a 65,000 token output window. This supports long-form generation, including technical manuals, structured reports, or multi-file code output. Google said this wider output window reduces task fragmentation, as large outputs can complete in a single response.

The company said these upgrades support developers who build autonomous agents. These agents often need to read large collections of files, move through complex directories, or generate long technical results.

Improved Benchmarks Across Logic, Coding, and Science

Google reported gains across several internal and external benchmarks. The model scored 94.1 percent on GPQA Diamond, which tests scientific reasoning. It reached 92.6 percent on MMMLU for multimodal understanding. The model also posted strong results on coding tests, including SWE-Bench Verified and LiveCodeBench Pro.

The company said the gains come from refinements in how the model allocates reasoning tokens. The structure is designed to reduce errors during long-horizon tasks and produce more stable outputs across dependent steps.

Google said the model can handle scientific workflows that need grounded reasoning or calculations. It can also support engineering teams that require robust code generation and complex debugging.

New Tools and Updated Agent Workflows

With this release, Google introduced a specialized endpoint called gemini-3.1-pro-preview-customtools. The endpoint is optimized for developers who use file system navigation, code search, and structured tool calls. The model is tuned to prioritize local tools, reducing the chance of unnecessary external searches.

The update also integrates with Google Antigravity, the company’s agent development platform. Developers can set a “medium” thinking level for tasks that need balanced depth and latency. Google said this option helps teams manage reasoning budgets while maintaining accuracy.

The Interactions API also includes a breaking change. The field total_reasoning_tokens is now named total_thought_tokens. Google said the change supports thought signatures, which preserve reasoning context for multi-turn workflows.

Pricing, Access, and Deployment Across Google Products

Pricing for Gemini 3.1 Pro Preview remains the same as the earlier model. Input tokens cost $2 per million for prompts under 200,000 tokens and $4 per million for larger prompts. Output tokens cost $12 per million for shorter prompts and $18 per million for longer prompts. Context caching remains available for workloads that require repeated calls.

The model is accessible through the Gemini API, Google AI Studio, Android Studio, and the Gemini CLI. Enterprise users can access the model through Vertex AI and Gemini Enterprise. Consumers can use the model in the Gemini app and NotebookLM with higher limits for paid subscribers.

Google said the preview period will allow the company to refine model behavior and safety before general availability. The company added that Gemini 3.1 Pro is positioned as a foundation for agentic AI systems that must reason through long tasks and work across complex environments.

Source link

Google Rolls Out Gemini 3.1 Pro Upgrade With Strong Reasoning Gains

TL;DR

Gemini 3.1 Pro Expanded Context Window and Output Capacity

Improved Benchmarks Across Logic, Coding, and Science

New Tools and Updated Agent Workflows

Pricing, Access, and Deployment Across Google Products

Stay in the Loop

Latest stories

You might also like...