Anthropic Gives Claude the Power to End Conversations
2025-08-16
Anthropic has empowered its large language model, Claude, with the ability to terminate conversations in cases of persistent harmful or abusive user interactions. This feature, born from exploratory research into AI welfare, aims to mitigate model risks. Testing revealed Claude's strong aversion to harmful tasks, apparent distress when encountering harmful requests, and a tendency to end conversations only after multiple redirection attempts fail. This functionality is reserved for extreme edge cases; the vast majority of users won't be affected.
AI
AI welfare