Audio-visual fine-tuning of audio-only ASR models
Published in arXiv preprint arXiv:2312.09369, 2023
Investigates using visual information to fine-tune audio-only automatic speech recognition models.
Recommended citation: Avner May, Dmitriy Serdyuk, Ankit Parag Shah, Otavio Braga, Olivier Siohan. "Audio-visual fine-tuning of audio-only ASR models." arXiv preprint arXiv:2312.09369, 2023.