Content-based Representations of audio using Siamese neural networks

Published in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018

In this paper, we focus on the problem of content-based retrieval for audio, which aims to retrieve all semantically similar audio recordings for a given audio clip query. We propose a novel approach which encodes the audio into a vector representation using Siamese Neural Networks. The goal is to obtain an encoding similar for files belonging to the same audio class, thus allowing retrieval of semantically similar audio. We used two similarity measures, Cosine similarity and Euclidean distance, to show that our method is effective in retrieving files similar in audio content. Our results indicate that our neural network-based approach is able to retrieve files similar in content and semantics.

Recommended citation: @inproceedings{manocha2018content, title={Content-based Representations of Audio Using Siamese Neural Networks}, author={Manocha, Pranay and Badlani, Rohan and Kumar, Anurag and Shah, Ankit and Elizalde, Benjamin and Raj, Bhiksha}, booktitle={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages={3136--3140}, year={2018}, organization={IEEE} }
Download Paper