Add like
Add dislike
Add to saved papers

Latent Attention Network With Position Perception for Visual Question Answering.

For exploring the complex relative position relationships among multiobject with multiple position prepositions in the question, we propose a novel latent attention (LA) network for visual question answering (VQA), in which LA with position perception is extracted by a novel LA generation module (LAGM) and encoded along with absolute and relative position relations by our proposed position-aware module (PAM). The LAGM reconstructs original attention into LA by capturing the tendency of visual attention shifting according to the position prepositions in the question. The LA accurately captures the complex relative position features of multiple objects and helps the model locate the attention to the correct object or region. The PAM adopts latent state and relative position relations to enhance the capability of comprehending the multiobject correlations. In addition, we also propose a novel gated counting module (GCM) to strengthen the sensitivity of quantitative knowledge for effectively improving the performance of counting questions. Extensive experiments demonstrate that our proposed method achieves excellent performance on VQA and outperforms state-of-the-art methods on the widely used datasets VQA v2 and VQA v1.

Full text links

We have located links that may give you full text access.
Can't access the paper?
Try logging in through your university/institutional subscription. For a smoother one-click institutional access experience, please use our mobile app.

Related Resources

For the best experience, use the Read mobile app

Mobile app image

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices Toggle icon

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app