Anthropic Revises Safety Policy to Allow AI Development Despite Unmitigated Risks

Anthropic updates its AI safety pledge, replacing mandatory pauses with flexible guidelines that permit development even when risks persist.

Bogdana Zujic

Senior Technology Editor

Updated February 26, 2026Feb 26, 2026

•

4 min read

Don't Miss the Good Stuff

Get tech news that matters delivered weekly. Join 50,000+ readers.

The AI company that promised to pause development if its models became too dangerous has rewritten its core safety pledge, replacing hard limits with conditional guidelines that allow continued advancement even when risks remain unmitigated.

Anthropic updated its Responsible Scaling Policy this week, removing language that committed the company to halt training or deployment of AI systems capable of catastrophic harm without proven safety measures. The original 2023 framework stated Anthropic "will not train or deploy models capable of causing catastrophic harm unless we have implemented safety and security measures that will keep risks below acceptable levels."

Under the revised policy, Anthropic will only consider delaying development if it maintains a "significant lead" over competitors and judges catastrophic risks to be material. If rivals advance with weaker safeguards, the company indicates it "will not necessarily delay AI development and deployment in this scenario." The new framework replaces categorical pause triggers with what Time described as "more flexible, discretionary language."

"We felt that it wouldn't actually help anyone for us to stop training AI models," Kaplan said.

Chief Science Officer Jared Kaplan told Time the company didn't believe unilateral pauses would help anyone if competitors continued advancing. "We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments... if competitors are blazing ahead."

The policy shift arrives as Anthropic faces simultaneous pressure from military demands and investor expectations. Defense Secretary Pete Hegseth reportedly gave Anthropic until Friday to drop certain guardrails for military use or face potential invocation of the Defense Production Act, which would compel the company to tailor its AI model to government needs.

Anthropic has maintained two primary red lines regarding military applications: no mass surveillance of Americans and no development of fully autonomous weapons. The Pentagon reportedly asked in December whether Claude could be used to autonomously launch missiles for missile defense systems.

Financial pressures also weigh on the decision. Anthropic raised $30 billion in Series G funding earlier this month at a $380 billion valuation, with annualized revenue growing at 10x per year according to company statements. Rival OpenAI is currently valued at over $850 billion.

The original Responsible Scaling Policy established Artificial Safety Levels (ASLs) with corresponding safeguards, creating automatic tripwires that would halt progress if capabilities outpaced safety measures. Version 3.0 introduces Frontier Safety Roadmaps and Risk Reports published every three to six months instead.

Independent reviewer Chris Painter of METR warned that society remains unprepared for catastrophic risks posed by advanced AI systems. Safety researchers say the language shift from prescriptive rules to principles-based guidance effectively removes enforcement teeth from what was once considered industry-leading policy.

Anthropic activated ASL-3 safeguards for relevant models in May 2025 and implemented bioweapon-related classifiers similar to those defenses.