Redis Vector Sets: Replicating Hacker News Account Style Detection
Inspired by a three-year-old Hacker News post about detecting similar accounts using cosine similarity, Antirez, using the new vector set functionality in Redis 8 RC1, replicated the experiment. He downloaded 10GB of Hacker News comment data, cleaned and preprocessed it to generate a JSONL file containing users and their word frequency vectors. Then, using the Burrows-Delta method, he normalized the word frequency vectors and inserted them into Redis vector sets. Finally, using the VSIM command, similar users with similar writing styles can be quickly found. The project code has been open-sourced, and an online demo website is available.
Read more