Efficiently Cleaning Up Logically Deleted Files in Object Storage: Lessons from WarpStream

2025-05-13
Efficiently Cleaning Up Logically Deleted Files in Object Storage: Lessons from WarpStream

This post details the author's experience in efficiently cleaning up logically deleted files in object storage when building distributed systems. Using bucket policies or synchronous deletion directly both have flaws: bucket policies can't handle complex systems with varying data retention needs, while synchronous deletion can lead to orphaned files. The author compares delayed queues and asynchronous reconciliation, ultimately adopting a hybrid approach: introducing an "optimistic deletion queue" in WarpStream Agents, combined with asynchronous reconciliation, for efficient and cost-effective file cleanup, effectively avoiding orphaned files. This approach leverages system characteristics and considers fault tolerance and disaster recovery.

Development data cleanup