Google’s Gemini panicked when playing Pokémon


<span class="caption">23 April 2022, Hessen, Frankfurt/Main: The Pokemon Pikachu, taken at the Pokemon European Championship, which is taking place again after a break of several years. Played with trading cards, on video game consoles and cell phones. Pokemon is all about collecting and training the monsters. Spectators were also allowed in the Frankfurt exhibition halls. Photo: Hannes P. Albert/dpa (Photo by Hannes P. Albert/picture alliance via Getty Images) | Image Credits:picture alliance / Getty Images</span>
23 April 2022, Hessen, Frankfurt/Main: The Pokemon Pikachu, taken at the Pokemon European Championship, which is taking place again after a break of several years. Played with trading cards, on video game consoles and cell phones. Pokemon is all about collecting and training the monsters. Spectators were also allowed in the Frankfurt exhibition halls. Photo: Hannes P. Albert/dpa (Photo by Hannes P. Albert/picture alliance via Getty Images) | Image Credits:picture alliance / Getty Images

AI companies are battling to dominate the industry, but sometimes they’re also battling in Pokémon gyms.

As Google and Anthropic both study how their latest AI models navigate early Pokémon games, the results can be as amusing as they are enlightening — and this time, Google DeepMind has written in a report that Gemini 2.5 Pro resorts to panic when its Pokémon are close to death. This can cause the AI’s performance to experience “qualitatively observable degradation in the model’s reasoning capability,” according to the report.

AI benchmarking — or, the process of comparing the performance of different AI models — is a dubious art that often provides little context for the actual capabilities of a given model. But some researchers think that studying how AI models play video games could be useful (or, at the very least, kind of funny).

Over the last several months, two developers unaffiliated with Google and Anthropic have set up respective Twitch streams called “Gemini Plays Pokémon” and “Claude Plays Pokémon,” where anyone can watch in real time as an AI tries to navigate a children’s video game from over 25 years ago.

Each stream displays the AI’s “reasoning” process — or, a natural language translation of how the AI evaluates a problem and arrives at a response — giving us insight into the way that these models work.

<span class="wp-block-image__credits"><strong>Image Credits:</strong>Google</span>
Image Credits:Google

While the progress of these AI models is impressive, they are still not very good at playing Pokémon. It takes hundreds of hours for Gemini to reason through a game that a child could complete in exponentially less time.

What’s interesting about watching an AI navigate a Pokémon game is not so much about its time of completion, but rather how it behaves along the way.

“Over the course of the playthrough, Gemini 2.5 Pro gets into various situations which cause the model to simulate ‘panic,’” the report says.

This state of “panic” can result in the model’s performance getting worse, as the AI may suddenly stop using certain tools at its disposal for a stretch of gameplay. While AI does not think or experience emotion, its actions mimic the way in which a human might make poor, hasty decisions when under stress — a fascinating, yet unsettling response.

“This behavior has occurred in enough separate instances that the members of the Twitch chat have actively noticed when it is occurring,” the report says.

Claude has also exhibited some curious behaviors in its journeys across Kanto. In one instance, the AI picked up on the pattern that when all of its Pokémon run out of health, the player character will “white out” and return to a Pokémon Center.



Source link