Webtagr - Technology News Summarizer

Popular：

Virtualization DNS security formal verification reachability analysis compiler errors macro conflict web extension development framework Bitmap Graphics API inconsistencies All Tags

New LLM Jailbreak Exploits Models' Evaluation Skills

2025-01-12

Researchers have discovered a novel LLM jailbreak technique, dubbed "Bad Likert Judge." This method leverages LLMs' ability to identify harmful content by prompting them to score such content and then requesting examples, thus generating outputs related to malware, illegal activities, harassment, and more. Tested on six state-of-the-art models across 1440 cases, the average success rate was 71.6%, reaching as high as 87.6%. The researchers recommend that maintainers of LLM applications utilize content filters to mitigate such attacks.

(www.scworld.com)

Tech LLM security jailbreak