Automated Audio Captioning and Language-Based Audio Retrieval

Ankit Shah

Automated Audio Captioning and Language-Based Audio Retrieval

Published in arXiv preprint arXiv:2207.04156, 2022

This work addresses two complementary tasks: automated audio captioning, which generates natural language descriptions for audio clips, and language-based audio retrieval, which retrieves audio clips based on textual queries. We present a unified framework that leverages cross-modal learning between audio and text representations to enable both caption generation and retrieval capabilities. Our approach demonstrates strong performance on standard benchmarks for both tasks.

Recommended citation: @article{gomes2022automated, title={Automated Audio Captioning and Language-Based Audio Retrieval}, author={Gomes, Clive and Park, Hyejin and Kollman, Patrick and Song, Yi and Houndayi, Iffanice and Shah, Ankit}, journal={arXiv preprint arXiv:2207.04156}, year={2022} }
Download Paper

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)