The release of Claude Opus 4.5 this week didn't knock Gemini 3 Pro off the top of the LMArena leaderboard... meaning today's episode of my podcast (I recorded it a week ago) about Google retaking the lead on AI is still relevant, woohoo! Here are the details...

Google appears to have regained the top spot in AI capability after trailing OpenAI and (later) Anthropic for years. Here's what you need to know about Gemini 3 Pro's performance:

GEMINI 3 PRO'S BENCHMARK DOMINANCE

Debuted first place on (difficult-to-game) LM Arena leaderboard across all key tracks, e.g., text reasoning, vision, coding and web development.
Scored 38% on Humanity's Last Exam (vs. GPT-5.1's 27%) and 23% on MathArena Apex (vs. ~1% for GPT-5.1 and Claude 4.5).
Achieved 31% on ARC-AGI-2 visual reasoning challenge (vs. ~18% for GPT-5.1 and ~14% for Claude 4.5)... a remarkable jump in abstract problem-solving.
Set new high watermarks on AIME math, GPQA diamond scientific knowledge and MMMU-Pro multimodal reasoning benchmarks as well.

CODING & AGENTIC AI CAPABILITIES

Crushed rivals on LiveCodeBench Pro while performing comparably on SWE-Bench Verified.
Generated $5,500 profit on Vending-Bench 2.0 simulation (vs. Claude's $4,000 and GPT-5.1's $1,500)
Scored 73% on ScreenSpot-Pro benchmark for screen understanding—more than double Claude Sonnet 4.5's 36%.
All of the above is significant for those of us developing agentic applications with LLMs :)

IMAGE GENERATION / IMAGE EDITING

If you haven't played with Google's Nano Banana Pro for image generation / image editing, it's pretty insane, go do it!... and the Gemini 3 Pro LLM is a key part of the magic running in the backend.

The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.