Anthropic Says Claude Models Can Now Terminate 'Abusive' Chats to Protect the AI

AsianFin -- Anthropic revealed that some of its latest and largest Claude models are now capable of ending conversations in what it calls “rare, extreme cases of persistently harmful or abusive user interactions.” Notably, the safeguard isn’t designed to protect people—but the AI itself.

The company stressed it is not suggesting Claude or other large language models are sentient. In its words, Anthropic remains “highly uncertain about the potential moral status of Claude and other LLMs, now or in the future.”

The move stems from a new research initiative on “model welfare.” As part of that program, Anthropic says it is taking a precautionary stance—testing “low-cost interventions to mitigate risks to model welfare, in case such welfare is possible.”

NEWS / Brief News

Anthropic Says Claude Models Can Now Terminate 'Abusive' Chats to Protect the AI

AsianFin Newsletters