Google DeepMind is rolling out Gemini 2.5 Deep Think, which, the company says, is its most advanced AI reasoning model, able to answer questions by exploring and considering multiple ideas simultaneously and then using those outputs to choose the best answer.
Subscribers to Google’s $250-per-month Ultra subscription will gain access to Gemini 2.5 Deep Think in the Gemini app starting Friday.
First unveiled in May at Google I/O 2025, Gemini 2.5 Deep Think is Google’s first publicly available multi-agent model. These systems spawn AI multiple agents to tackle a question in parallel, a process that uses significantly more computational resources than a single agent, but tends to result in better answers.
Google used a variation of Gemini 2.5 Deep Think to score a gold medal at this year’s International Math Olympiad (IMO).
Alongside Gemini 2.5 Deep Think, the company says it is releasing the model it used at the IMO to a select group of mathematicians and academics. Google says this AI model “takes hours to reason,” instead of seconds or minutes like most consumer-facing AI models. The company hopes the IMO model will enhance research efforts, and aims to get feedback on how to improve the multi-agent system for academic use cases.
Google notes that the Gemini 2.5 Deep Think model is a significant improvement over what it announced at I/O. The company also claims to have developed “novel reinforcement learning techniques” to encourage Gemini 2.5 Deep Think to make better use of its reasoning paths.
“Deep Think can help people tackle problems that require creativity, strategic planning and making improvements step-by-step,” said Google in a blog post shared with TechCrunch.
Techcrunch event
San Francisco | October 27-29, 2025
The company says Gemini 2.5 Deep Think achieves state-of-the-art performance on Humanity’s Last Exam (HLE) — a challenging test measuring AI’s ability to answer thousands of crowdsourced questions across math, humanities, and science. Google claims its model scored 34.8% on HLE (without tools), compared to xAI’s Grok 4, which scored 25.4%, and OpenAI’s o3, which scored 20.3%.
Google also says Gemini 2.5 Deep Think outperforms AI models from OpenAI, xAI, and Anthropic on LiveCodeBench6, a challenging test of competitive coding tasks. Google’s model scored 87.6%, whereas Grok 4 scored 79%, and OpenAI’s o3 scored 72%.

Gemini 2.5 Deep Think automatically works with tools such as code execution and Google Search, and the company says it’s capable of producing “much longer responses” than traditional AI models.
In Google’s testing, the model produced more detailed and aesthetically pleasing web development tasks compared to other AI models. The company claims the model could aid researchers and “potentially accelerate the path to discovery.”

It seems that several leading AI labs are converging around the multi-agent approach.
Elon Musk’s xAI recently released a multi-agent system of its own, Grok 4 Heavy, which it says was able to achieve industry leading performance on several benchmarks. OpenAI researcher Noam Brown said on a podcast that the unreleased AI model the company used to achieve a gold medal at this year’s International Math Olympiad (IMO) was also a multi-agent system. Meanwhile, Anthropic’s Research agent, which generates thorough research briefs, is also powered by a multi-agent system.
Despite the strong performance, it seems that multi-agent systems are even costlier to serve than traditional AI models. That means tech companies may keep these systems gated behind their most expensive subscription plans, which xAI and now Google have chosen to do.
In the coming weeks, Google says it plans to share Gemini 2.5 Deep Think with a select group of testers via the Gemini API. The company says it wants to better understand how developers and enterprises may use its multi-agent system.