Apple's AI Safety Model Decrypted: Unveiling its Content Filtering Mechanisms
2025-07-07
This project decrypts Apple's AI safety model filter files, which contain rules for various models. Using LLDB debugging and custom scripts, the encryption key can be obtained and these files decrypted. The decrypted JSON files contain rules for filtering harmful content and ensuring safety compliance, such as exact keyword matching, phrases to remove, and regular expression filtering. The project provides the decrypted rule files and decryption scripts, allowing researchers to analyze Apple's AI model safety mechanisms.