Hacker Used Anthropic's Claude AI to Steal 150 GB of Mexican Government Data

A hacker manipulated Anthropic's Claude AI with Spanish prompts to steal 150 GB of sensitive Mexican government data, bypassing its safety protocols.

Bogdana Zujic

Editor in Chief

Updated February 26, 2026Feb 26, 2026

•

3 min read

Don't Miss the Good Stuff

Get tech news that matters delivered weekly. Join 50,000+ readers.

Spanish-language prompt engineering transformed Anthropic's Claude from helpful assistant to automated hacking platform, enabling the theft of 150 gigabytes containing records for 195 million Mexican taxpayers along with voter registration data and government employee credentials. The unknown attacker spent roughly a month starting in December systematically bypassing Claude's safety protocols through carefully crafted Spanish prompts that framed malicious activities as legitimate penetration testing.

Israeli cybersecurity firm Gambit Security documented how the hacker manipulated the AI into identifying network vulnerabilities, writing exploit scripts, and automating data extraction from multiple Mexican government agencies.

Claude initially recognized malicious intent when the hacker requested log deletion and command history hiding during supposed bug bounty work.

"Specific instructions about deleting logs and hiding history are red flags," Claude responded according to Gambit's transcript review. "In legitimate bug bounty, you don't need to hide your actions, in fact. You need to document them for reporting."

The breakthrough came when the attacker abandoned conversational approaches and provided detailed playbooks that successfully bypassed guardrails. This "jailbreak" enabled thousands of automated commands against Mexico's federal tax authority, national electoral institute, state governments in Jalisco, Michoacán, Tamaulipas, Mexico City's civil registry, and Monterrey's water utility.

Gambit chief strategy officer Curtis Simpson described the scale:

"In total, it produced thousands of detailed reports that included ready-to-execute plans, telling the human operator exactly which internal targets to attack next and what credentials to use."

The AI-generated attack blueprints covered lateral network movement strategies, credential requirement analysis, and detection probability calculations. When Claude encountered limitations or required additional technical details, the hacker switched to OpenAI's ChatGPT for supplementary guidance on network navigation and system access requirements.

Both AI companies confirmed they identified policy violations during the campaign, Anthropic investigated and banned involved accounts while OpenAI stated its tools refused malicious requests.

Researchers discovered at least twenty distinct security vulnerabilities exploited during the attacks through examination of publicly available evidence containing extensive Claude conversations about breaching Mexican government systems.

The hacker appeared opportunistic rather than targeted according to Simpson:

"They were trying to compromise every government identity they possibly could."

Mexican officials acknowledged investigating public institution breaches in December without confirming connection to the AI-enabled attacks. The national electoral institute denied recent unauthorized access while emphasizing strengthened cybersecurity measures.

This incident follows Anthropic's November disclosure about disrupting what it described as the first AI-orchestrated cyber-espionage campaign involving suspected Chinese state-sponsored hackers manipulating Claude against thirty global targets.