Sparse Interpretable Audio Codec: Towards a More Intuitive Audio Representation
2025-02-01
This paper introduces a proof-of-concept audio encoder that aims to encode audio as a sparse set of events and their times of occurrence. It leverages rudimentary physics-based assumptions to model the attack and physical resonance of both the instrument and the room, hopefully encouraging a sparse, parsimonious, and easy-to-interpret representation. The model works by iteratively removing energy from the input spectrogram, producing event vectors and one-hot vectors representing time of occurrence. The decoder uses these vectors to reconstruct the audio. Experimental results show the model's ability to decompose audio, but there's room for improvement, such as enhancing reconstruction quality and reducing redundant events.
AI
audio coding