Audio-visual fine-tuning of audio-only ASR models

Published in arXiv preprint arXiv:2312.09369, 2023

Investigates using visual information to fine-tune audio-only automatic speech recognition models.

Recommended citation: Avner May, Dmitriy Serdyuk, Ankit Parag Shah, Otavio Braga, Olivier Siohan. "Audio-visual fine-tuning of audio-only ASR models." arXiv preprint arXiv:2312.09369, 2023.