The Ephemeral Web: Preserving Your Digital Legacy

2025-02-25
The Ephemeral Web: Preserving Your Digital Legacy

Online content is fleeting. This article highlights the importance of preserving digital information and advocates for using plain text Markdown files as a robust solution. Markdown's readability, cross-platform compatibility, and long-term accessibility are key advantages. The author shares personal strategies, including using Obsidian for note-taking, static site generators for blogging, and automated Python scripts for content conversion. Regular backups and data downloads are emphasized, alongside a call for better social media data export tools.

Read more
Misc

Don't Use Cosine Similarity Carelessly!

2025-01-14
Don't Use Cosine Similarity Carelessly!

This article explores the risks of over-relying on cosine similarity for vector comparisons in data science. The author argues that while computationally simple, cosine similarity often fails to capture semantic similarity and can be easily misled by superficial patterns like writing style and typos. The article uses examples to illustrate this problem and proposes several improved methods, including: directly using LLMs for comparison, fine-tuning or transfer learning to create task-specific embeddings, pre-prompt engineering, and text preprocessing. The author emphasizes the importance of choosing appropriate similarity metrics based on specific needs, rather than blindly using cosine similarity.

Read more