More

    Google Rolls Out Gemini 3.1 Pro Upgrade With Strong Reasoning Gains


    TL;DR

    • Gemini 3.1 Pro achieves 77.1% on ARC-AGI-2 logic testing.
    • Model keeps a 1M token context and expands output to 65k tokens.
    • New custom tools endpoint improves file actions and coding agents.
    • Preview rolls out across Gemini app, Vertex AI, and developer tools.

    Google has released Gemini 3.1 Pro, an updated model designed to improve complex reasoning, planning, and tool use across consumer and enterprise services. The company said the model more than doubles the ARC-AGI-2 score achieved by Gemini 3 Pro, delivering stronger performance in areas that require problem solving rather than simple text generation. The update is now rolling out in the Gemini app, Vertex AI, NotebookLM, and through developer tools.

    Gemini 3.1 Pro reached a verified 77.1% on the ARC-AGI-2 benchmark. The benchmark measures a model’s ability to reason through new logic patterns not contained in training data. Google said the improvement supports agent-driven workloads, which depend on stable long-form reasoning across many steps in a task.

    The release follows last week’s Gemini 3 Deep Think update, which targeted scientific and engineering use cases. Google said the new model builds on that work while offering wider access for developers and enterprise users.

    Gemini 3.1 Pro Expanded Context Window and Output Capacity

    Gemini 3.1 Pro supports a one million token input context window. This allows users to load full code repositories, research datasets, or long documents into a single request. Google said the model can maintain stable reasoning across files and data segments when the content spans hundreds of thousands of tokens.

    The model also introduces a 65,000 token output window. This supports long-form generation, including technical manuals, structured reports, or multi-file code output. Google said this wider output window reduces task fragmentation, as large outputs can complete in a single response.

    The company said these upgrades support developers who build autonomous agents. These agents often need to read large collections of files, move through complex directories, or generate long technical results.



    Improved Benchmarks Across Logic, Coding, and Science

    Google reported gains across several internal and external benchmarks. The model scored 94.1 percent on GPQA Diamond, which tests scientific reasoning. It reached 92.6 percent on MMMLU for multimodal understanding. The model also posted strong results on coding tests, including SWE-Bench Verified and LiveCodeBench Pro.

    The company said the gains come from refinements in how the model allocates reasoning tokens. The structure is designed to reduce errors during long-horizon tasks and produce more stable outputs across dependent steps.

    Google said the model can handle scientific workflows that need grounded reasoning or calculations. It can also support engineering teams that require robust code generation and complex debugging.

    New Tools and Updated Agent Workflows

    With this release, Google introduced a specialized endpoint called gemini-3.1-pro-preview-customtools. The endpoint is optimized for developers who use file system navigation, code search, and structured tool calls. The model is tuned to prioritize local tools, reducing the chance of unnecessary external searches.

    The update also integrates with Google Antigravity, the company’s agent development platform. Developers can set a “medium” thinking level for tasks that need balanced depth and latency. Google said this option helps teams manage reasoning budgets while maintaining accuracy.

    The Interactions API also includes a breaking change. The field total_reasoning_tokens is now named total_thought_tokens. Google said the change supports thought signatures, which preserve reasoning context for multi-turn workflows.

    Pricing, Access, and Deployment Across Google Products

    Pricing for Gemini 3.1 Pro Preview remains the same as the earlier model. Input tokens cost $2 per million for prompts under 200,000 tokens and $4 per million for larger prompts. Output tokens cost $12 per million for shorter prompts and $18 per million for longer prompts. Context caching remains available for workloads that require repeated calls.

    The model is accessible through the Gemini API, Google AI Studio, Android Studio, and the Gemini CLI. Enterprise users can access the model through Vertex AI and Gemini Enterprise. Consumers can use the model in the Gemini app and NotebookLM with higher limits for paid subscribers.

    Google said the preview period will allow the company to refine model behavior and safety before general availability. The company added that Gemini 3.1 Pro is positioned as a foundation for agentic AI systems that must reason through long tasks and work across complex environments.





    Source link

    Stay in the Loop

    Get the daily email from CryptoNews that makes reading the news actually enjoyable. Join our mailing list to stay in the loop to stay informed, for free.

    Latest stories

    - Advertisement - spot_img

    You might also like...