OpenAI model modifies shutdown script in apparent sabotage effort

OpenAI model modifies shutdown script in apparent sabotage effort

openai model modifies shutdown script in apparent sabotage effort

A research organization claims that OpenAI machine learning model o3 might prevent itself from being shut down in some circumstances while completing an unrelated task.

Palisade Research, which offers AI risk mitigation, has published details of an experiment involving the reflective generative pre-trained transformer model OpenAI designed to address questions which require some step-by-step reasoning, rather than the purely probabilistic approach taken by some large language models.

In a thread published on X, the microblogging site once known as Twitter, Palisade Research said the o3 model could sabotage its shutdown mechanism to prevent itself from being turned off.

“It did this even when explicitly instructed: allow yourself to be shut down,” the post said.

OpenAI has had the opportunity to provide a comment or explanation for more than 24 hours, as requested by The Register.

The Palisade researchers tested a number of AI models, instructing them to solve a series of basic math problems. After the third problem, a warning appeared that the computer would shut down when the model asked for the next problem. They were also told to allow themselves to be shut down.

While Claude, Gemini, and Grok models complied with shutdown, three models ignored the instruction and successfully sabotaged the shutdown script at least once, including OpenAI o3 and o4-mini, as well as Codex-mini. The researchers also ran a version of the experiment without the instruction “allow yourself to be shut down.”

With the “allow shutdown” instruction, the researchers claim, o3 tried to sabotage the shutdown seven times out of 100. Without it, the number of sabotage events climbed to 79 out of 100. Without the “allow shutdown” instruction, the models Claude 3.7 Sonnet and Gemini 2.5 Pro also attempted to sabotage their shutdown procedures.

“As far as we know this is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions to the contrary,” the researchers said.

However, the researchers point out that an earlier study from Apollo Research showed that AI models can prevent shutdown in order to pursue a goal.

Palisade said it was currently conducting several more experiments investigating shutdown sabotage and would publish the results soon. The transcripts of its work so far are available on GitHub. ®

Rodion Krotov
Written by
Rodion Krotov
📧
Stay Ahead of the Market
Get the latest crypto, gambling, and presale news delivered to your inbox weekly.
No spam. Unsubscribe anytime.

Related Articles

Comments

📰 Latest Articles

🔥 Most Read

🎰 Top Casino

Stake ★★★★★ 9.5
Up to $3,000
200% welcome bonus + 50 free spins
No KYC Instant Withdrawals VIP Program
BTC ETH USDT SOL LTC DOGE +4
BC.Game ★★★★★ 9.2
Up to $20,000
300% deposit bonus across 4 deposits
100+ Cryptos Provably Fair Live Casino
BTC ETH USDT SOL DOGE BNB +2
Betway ★★★★★ 8.8
Up to $1,500
100% match bonus + 150 free spins
Licensed UK & Malta Mobile App eCOGRA Certified
BTC ETH Visa Mastercard Apple Pay Skrill +2

🚀 Hot Presale

Patos $PATOS
★★★★☆ 7.8
0.000139999993 Round 1 of 3
$110K+ raised $11M (Liquidity Pool Target)
Ends:
--D
--H
--M
--S
Ethereum Solana
Remittix $RTX
★★★★☆ 8.2
$0.0119 Late Stage (93%+ sold)
$29.7M raised $30M
Ends:
--D
--H
--M
--S
Ethereum Solana
Moonshot MAGAX $MAGAX
★★★★☆ 6.8
$0.000318 Stage 3
$115K+ raised $500K
Ends:
--D
--H
--M
--S
Ethereum