Triple Attention Network architecture for MovieQA
Published in arXiv preprint arXiv:2111.09531, 2021
We propose a Triple Attention Network architecture for question answering on movies. Our model incorporates three attention mechanisms that jointly attend to visual frames, subtitles, and plot descriptions to answer complex questions about movie content. The triple attention mechanism enables effective reasoning across multiple modalities, achieving competitive performance on the MovieQA benchmark.
Recommended citation: @article{shah2021triple, title={Triple Attention Network Architecture for MovieQA}, author={Shah, Ankit and Lin, Tzu-Hsiang and Wu, Shijie}, journal={arXiv preprint arXiv:2111.09531}, year={2021} }
Download Paper