A Closer Look at Weak Label Learning
for Audio Events

Carnegie Mellon University, Language Technologies Institute

* denotes equal contribution

Abstract

Audio content analysis in terms of sound events is an important research problem for a variety of applications. Recently, the development of weak labeling approaches for audio or sound event detection (AED) and availability of large scale weakly labeled dataset have finally opened up the possibility of large scale AED. However, a deeper understanding of how weak labels affect the learning for sound events is still missing from literature. In this work, we first describe a CNN based approach for weakly supervised training of audio events. The approach follows some basic design principle desirable in a learning method relying on weakly labeled audio. We then describe important characteristics, which naturally arise in weakly supervised learning of sound events. We show how these aspects of weak labels affect the generalization of models. More specifically, we study how characteristics such as label density and corruption of labels affects weakly supervised training for audio events. We also study the feasibility of directly obtaining weak labeled data from the web without any manual label and compare it with a dataset which has been manually labeled. The analysis and understanding of these factors should be taken into picture in the development of future weak label learning methods. Audioset, a large scale weakly labeled dataset for sound events is used in our experiments.

Network Architecture

Paper

Citation

Ankit Shah, Anurag Kumar, Alexander Hauptmann, Bhiksha Raj A Closer Look at Weak Label Learning for Audio Events Submitted to IEEE JSTSP Special Issue: Machine Learning for Audio Processing. [Show BibTex] [PDF] [Code - GitHub]

@article{shah2018closer, title={A Closer Look at Weak Label Learning for Audio Events}, author={Shah, Ankit and Kumar, Anurag and Hauptmann, Alexander G and Raj, Bhiksha}, journal={arXiv preprint arXiv:1804.09288}, year={2018} }

Weak Audio Learning Related Papers

Anurag Kumar, Bhiksha Raj Weakly Supervised Scalable Audio Content Analysis IEEE International Conference on Multimedia and Expo (ICME), July 2016.

Romain Serizel, Nicolas Turpault, Hamid Eghbal-Zadeh, Ankit Parag Shah Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments DCASE2018 Workshop, Nov 2018

Other works on related to “Audio Event” Learning

Szu-Yu Chou, Jyh-Shing Roger Jang, Yi-Hsuan Yang Learning to Recognize Transient Sound Events Using Attentional Supervision 27th International Joint Conference on Artificial Intelligence, 2018

Shizhe Chen, Jia Chen, Qin Jin, Alexander Hauptmann Class-aware Self-Attention for Audio Event Recognition International Conference on Multimedia Retrieval, 2018

Funding

This research was supported by:

Carnegie Mellon University

Contact

For questions/comments, contact Ankit Shah, Anurag Kumar