A Cute Bug in HyperQueue: SIGTERM and the Ten-Second Mystery

2025-02-24

A curious bug emerged in HyperQueue, a Rust-based distributed task scheduler. Tasks, particularly those sleeping for more than 10 seconds, would mysteriously terminate. Debugging revealed a seemingly innocuous change: offloading process spawning to `tokio::task::spawn_blocking`. This, combined with `PR_SET_PDEATHSIG` (which sends SIGTERM upon parent process death), caused the issue. The worker thread spawned by `spawn_blocking` was being reaped by Tokio after inactivity, triggering the SIGTERM signal. The bug was fixed by reverting the optimization, highlighting the subtle interactions between concurrency, system calls, and thread management.

Development