Reasoning for Audio Visual Scene-Aware Dialog Track in DSTC10

Published in Dialog System Technology Challenge 10 (DSTC10), 2021

This paper presents our approach to the Audio Visual Scene-Aware Dialog (AVSD) track in DSTC10. We introduce reasoning methods that enable the dialog system to understand and respond to questions about video content by leveraging both audio and visual modalities. Our approach demonstrates improved temporal reasoning capabilities for grounding dialog responses in specific moments within the video.

Recommended citation: @inproceedings{geng2021reasoning, title={Reasoning for Audio Visual Scene-Aware Dialog Track in DSTC10}, author={Geng, Shijie and Gao, Peng and Cherian, Anoop and Marks, Tim K and Hori, Chiori and Shah, Ankit}, booktitle={Dialog System Technology Challenge 10 (DSTC10)}, year={2021} }