Triple Attention Network architecture for MovieQA

Ankit Shah

Triple Attention Network architecture for MovieQA

Published in arXiv preprint arXiv:2111.09531, 2021

We propose a Triple Attention Network architecture for question answering on movies. Our model incorporates three attention mechanisms that jointly attend to visual frames, subtitles, and plot descriptions to answer complex questions about movie content. The triple attention mechanism enables effective reasoning across multiple modalities, achieving competitive performance on the MovieQA benchmark.

Recommended citation: @article{shah2021triple, title={Triple Attention Network Architecture for MovieQA}, author={Shah, Ankit and Lin, Tzu-Hsiang and Wu, Shijie}, journal={arXiv preprint arXiv:2111.09531}, year={2021} }
Download Paper

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)