Approach to Learning Generalized Audio Representation Through Batch Embedding Covariance Regularization and Constant-Q Transforms

Published in arXiv preprint arXiv:2303.03591, 2023

We propose a novel approach to learning generalized audio representations through batch embedding covariance regularization combined with Constant-Q transforms. Our method encourages the model to learn diverse and decorrelated features by regularizing the covariance matrix of embedding vectors within each batch. We demonstrate that using Constant-Q transforms as input features, which provide logarithmically-spaced frequency resolution similar to human auditory perception, combined with our regularization technique leads to more robust and transferable audio representations across various downstream tasks.

Recommended citation: @article{shah2023approach, title={Approach to Learning Generalized Audio Representation Through Batch Embedding Covariance Regularization and Constant-Q Transforms}, author={Shah, Ankit and Chen, Shuyi and Zhou, Kejun and Chen, Yue and Raj, Bhiksha}, journal={arXiv preprint arXiv:2303.03591}, year={2023} }
Download Paper