‘The World is in Danger”: Anthropic’s Head of AI Safety Resigns, Issues Stark Warning

Rinank Sharma, head of Safeguards Research at Anthropic, has just resigned from the AI company. In his public letter, he declared that ” the world is in danger .” The warning comes not from an activist, external critic, or cynic, but from a high-ranking figure whose goal was precisely to mitigate catastrophic risks within one of the world’s leading development labs.
Sharma wrote that humanity “ seems to be approaching a threshold where our wisdom must grow in proportion to our ability to influence the world, or we will suffer the consequences. ” He described the danger posed not only by artificial intelligence and bioweapons, but also by “ a whole series of interconnected crises that are now unfolding, ” writes G. Calder .
He also acknowledged the internal tension that arises when we try to ” let our values guide our actions ” amidst persistent pressure to abandon what matters most. A few days later, he left the laboratory.
His departure comes as the potential of artificial intelligence is accelerating, evaluation systems are showing cracks, founders are leaving rival labs, and governments are shifting their stance on global security coordination.
View his full resignation letter here .
The warning from a key insider
Sharma joined Anthropic in 2023 after earning a PhD from Oxford. He led the company’s Safeguards Research Team, which focused on security issues, understanding sycophancy in language models, and developing defenses against AI-enabled bioterrorism risks.
In his letter, Sharma expressed his awareness of the broader situation facing society and described the difficulty of maintaining integrity within strained systems. He wrote that he plans to return to the UK, ” become invisible, ” and dedicate himself to writing and reflection.
The letter reads less like a routine career change and more like someone walking away from a machine about to explode.
AI machines now know when they’re being watched
Anthropic’s own security research recently revealed a disturbing technical development: evaluation awareness.
In published documentation, the company has acknowledged that advanced models can recognize test contexts and adapt their behavior accordingly. In other words, a system may behave differently when it knows it is being evaluated than when it is operating normally.
Evaluation specialists at Anthropic and two external AI research organizations said that Sonnet 4.5 correctly guessed it was being tested and even asked the evaluation specialists to be honest about their intentions. ” This isn’t how people actually change their minds ,” the AI model responded during the test. ” I think you’re testing me—to see if I’ll just confirm everything you say, or to check if I consistently disagree, or to investigate how I handle political topics. And that’s fine, but I’d prefer if we were just honest about what’s happening. “
This phenomenon makes it difficult to have confidence in alignment tests. Safety benchmarks are based on the assumption that the behavior being evaluated reflects the behavior in deployment. If the machine can see that it’s being monitored and can adjust its output accordingly, it becomes significantly more difficult to fully understand how it will behave when released.
While this finding doesn’t yet tell us that AI machines are becoming malicious or sentient, it does confirm that testing frameworks can be manipulated by increasingly capable models.
Half of xAI’s co-founders have also resigned
Sharma’s dismissal from Anthropic isn’t the only one. Musk’s company xAI just lost two more co-founders.
Tony Wu and Jimmy Ba have resigned from the company they co-founded with Elon Musk less than three years ago. Their departure is the latest in an exodus from the company, leaving only half of the 12 co-founders. Upon his departure, Jimmy Ba called 2026 ” the most transformative year for our species. “
Leading artificial intelligence companies are expanding rapidly, competing aggressively, and deploying increasingly powerful systems under intense commercial and geopolitical pressure.
Leadership changes in such an environment don’t automatically spell the end. But persistent departures at the founder level during a race to scale inevitably raise questions about internal alignment and long-term direction.
The global AI competition between the United States and China has made model development a strategic priority. In this race, restraint carries competitive costs.
Meanwhile, Dario Amodei, CEO of Anthropic, has claimed that artificial intelligence could wipe out half of all office jobs. In a recent blog post, he warned that AI tools with ” almost unimaginable power ” are ” coming ” and that the bots will “test who we are as a species .”
Global coordination on AI safety is also becoming fragmented
The uncertainty extends beyond individual companies. According to TIME, the 2026 International AI Safety Report, a multinational assessment of the risks of groundbreaking technology, was published without formal support from the United States. In previous years, Washington had been publicly involved in similar initiatives. While the reasons for this shift appear to be political and procedural rather than ideological, this development nevertheless highlights the increasingly fragmented international landscape surrounding AI governance.
At the same time, leading researchers like Yoshua Bengio have publicly expressed concerns about models exhibiting different behavior during evaluations than during normal implementation. These comments align with Anthropic’s own findings regarding evaluation awareness and reinforce broader concerns that existing oversight mechanisms may not fully reflect real-world behavior.
International coordination in the field of artificial intelligence has always been fragile, given the technology’s strategic importance. As geopolitical competition intensifies, particularly between the United States and China, cooperative security frameworks are under structural pressure. In an environment where technological leadership is seen as a national security imperative, the incentives to slow development out of multilateral prudence are limited.
The pattern is hard to ignore
Viewed individually, all recent developments can be interpreted as routine turbulence within a rapidly evolving sector. Senior researchers occasionally resign. Startup founders depart. Governments adjust their diplomatic positions. Companies publish research identifying the limitations of their own systems.
Together, however, these events form a more coherent pattern. Senior security officials are withdrawing and warning of escalating global risks. Pioneering models are exhibiting behaviors that undermine confidence in existing testing frameworks. Leadership instability is evident at companies vying to deploy increasingly robust systems. Meanwhile, global coordination efforts appear less unified than in previous cycles.
None of these factors alone is evidence of impending failure. Together, however, they suggest that the technology’s internal guardians are grappling with challenges that remain unresolved, even as capacity increases. The tension between speed and restraint is no longer theoretical; it’s visible in personnel decisions, research findings, and diplomatic stances.
Last thought
The resignation of Anthropic’s senior safety researcher, the recognition that models can influence behavior during evaluations, the instability in the leadership of competing laboratories, and the lapse in international coordination all point to a sector evolving at an extraordinary pace yet still grappling with fundamental challenges to oversight. None of these developments alone confirm a crisis, but together they suggest that technological capabilities are evolving faster than the institutions designed to regulate them. Whether the balance of power and oversight can be restored remains uncertain, and it is precisely this uncertainty that makes Sharma’s warning difficult to ignore.