Google releases Gemini 3.1 Pro: Benchmark performance, how to try it

0
75

Google releases Gemini 3.1 Pro: Benchmarks, how to try it

Google says that its most advanced thinking model yet outperforms Claude and ChatGPT on Humanity's Last Exam and other key benchmarks.

 By 

Timothy Beck Werth

 on 

Share on Facebook Share on Twitter Share on Flipboard

gemini 3.1 pro banner image from google

Credit: Google

Google released its latest core reasoning model, Gemini 3.1 Pro, on Thursday. Google says that Gemini 3.1 Pro achieved twice the verified performance of 3 Pro on ARC-AGI-2, a popular benchmark that measures a model's logical reasoning.

Google originally released Gemini 3 and 3 Pro in November, and this new release shows just how fast AI companies are introducing new and updated models. Gemini 3.1 Pro is the new core model powering Gemini and various Google AI tools, such as Gemini 3 Deep Think. Google says it's designed to provide more creative solutions.

"3.1 Pro is designed for tasks where a simple answer isn't enough, taking advanced reasoning and making it useful for your hardest challenges," a Google blog post states. "This improved intelligence can help in practical applications — whether you’re looking for a clear, visual explanation of a complex topic, a way to synthesize data into a single view, or bringing a creative project to life."

Here's everything we know so far about Gemini 3.1 Pro, including how it compares to the latest models from Anthropic and OpenAI, and how to try it yourself.

Starting today, Google is rolling out Gemini 3.1 Pro in the Gemini App, the Gemini APIA, and in Notebook LM. Free users will be able to try 3.1 Pro in the Gemini app, but paid users on Google AI Pro and AI Ultra plans will have higher usage rates. Within Notebook LM, only these paid users will have access to 3.1 Pro, at least, for now. Coders and enterprise users can also access the new core model via developers and enterprises can access 3.1 through AI Studio, Antigravity, Vertex AI, Gemini Enterprise, Gemini CLI, and Android Studio.

Mashable Light Speed

Gemini 3.1 Pro was already available for Mashable editors using Gemini. To try it for yourself, head to Gemini on desktop or open the Gemini mobile app.

screenshot showing animation from gemini 3 pro

Left: Two results of the same animation prompt. Credit: Google

Right: Credit: Google


Why Gemini 3.1 Pro matters

When Google released Gemini 3 Pro in November, the model was so impressive that it allegedly caused OpenAI CEO Sam Altman to declare a code red. As Gemini 3 Pro surged to the top of AI leaderboards, OpenAI reportedly started losing ChatGPT users to Gemini. The latest core ChatGPT model, GPT-5.2, has tumbled down the rankings on leaderboards like Arena (formerly known as LMArena), losing significant ground to competitors such as Google, Anthropic, and xAI.

This Tweet is currently unavailable. It might be loading or has been removed.

Gemini 3 Pro was already outperforming GPT-5.2 on many benchmarks, and with a more advanced thinking model, Gemini could move even further ahead.

Gemini 3.1 Pro: Benchmark performance

Google released benchmark performance data showing that Gemini 3.1 Pro outperforms previous Gemini models, Claude Sonnet 4.6, Claude Opus 4.6, and GPT-5.2. However, OpenAI's new coding model, GPT-5.3-Codex, beat Gemini 3.1 Pro on the verified SWE-Bench Pro benchmark, according to Google itself.

Notable highlights from Gemini 3.1 Pro's benchmark results include:

  • 44.4 percent on Humanity's last exam, compared to 40.0 percent for Claude Opus 4.6 and 34.5 percent for GPT-5.2

  • 77.1 percent on ARC-AGI-2, compared to 31.1 percent for Gemini 3 Pro, 68.8 percent for Claude Opus 4.6, and 52.9 percent for GPT-5.2

  • 94.3 percent on GPQA Diamond, compared to 91.9 percent for Gemini 3 Pro, 91.3 percent for Claude Opus 4.6, and 92.4 percent for GPT-5.2

  • 80.6 percent on SWE-Bench Verified, compared to 76.2 percent for Gemini 3 Pro, 80.8 percent for Claude Opus 4.6, and 80.0 percent for GPT-5.2

  • 54.2 percent on SWE-Bench Pro (Public), compared to 43.3 percent for Gemini 3 Pro, 55.6 percent for GPT-5.2, and 56.8 percent for GPT-5.3-Codex

  • 92.6 percent on MMLU, compared to 91.1 percent for Claude Opus 4.6 and 89.6 percent for GPT-5.2

Google released an image showing the full benchmark results for Gemini 3.1 Pro:

This Tweet is currently unavailable. It might be loading or has been removed.

Disclosure: Ziff Davis, Mashable’s parent company, in April 2025 filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.

headshot of timothy beck werth, a handsome journalist with great hair

Timothy Beck Werth is the Tech Editor at Mashable, where he leads coverage and assignments for the Tech and Shopping verticals. Tim has over 15 years of experience as a journalist and editor, and he has particular experience covering and testing consumer technology, smart home gadgets, and men’s grooming and style products. Previously, he was the Managing Editor and then Site Director of SPY.com, a men's product review and lifestyle website. As a writer for GQ, he covered everything from bull-riding competitions to the best Legos for adults, and he’s also contributed to publications such as The Daily Beast, Gear Patrol, and The Awl.

Tim studied print journalism at the University of Southern California. He currently splits his time between Brooklyn, NY and Charleston, SC. He's currently working on his second novel, a science-fiction book.

Mashable Potato

These newsletters may contain advertising, deals, or affiliate links. By clicking Subscribe, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.

Buscar
Categorías
Read More
Juegos
Medieval RTS Stronghold Crusader packs new maps and missions into a free update, alongside two brutal DLC lords
Medieval RTS Stronghold Crusader packs new maps and missions into a free update, alongside two...
By Test Blogger6 2026-02-05 12:00:19 0 670
Technology
The LG UltraGear 34-inch OLED gaming monitor is almost half price at Amazon — save $600
Best gaming monitor deal: LG UltraGear 34-inch OLED gaming monitor is nearly half off...
By Test Blogger7 2026-01-23 21:00:51 0 860
Juegos
Corsair Sabre V2 Pro Ultralight Wireless review - a nearly perfect ultra light gaming mouse
Corsair Sabre V2 Pro Ultralight Wireless review - a nearly perfect ultra light gaming mouse...
By Test Blogger6 2026-02-12 16:00:15 0 318
Juegos
AMD just got accused of using AI-generated code that's full of "AI slop"
AMD just got accused of using AI-generated code that's full of "AI slop" AMD has just been...
By Test Blogger6 2026-01-30 16:00:12 0 757
Technology
TikTok just changed its Terms of Service. What does that mean for your privacy?
Does TikTok's new Terms of Service track race, gender identity, and immigration status?...
By Test Blogger7 2026-01-24 00:00:44 0 805