Author: Meena Kowshalya

Meena Kowshalya

57193347708

Publications - 1

A vision explainability method for image captioning using transformer decoder attention maps

Dragan Pamucar Rajesh Kumar Dhanaraj Meena Kowshalya Rajesh Kumar Dhanaraj Meena Kowshalya None Suchitra

Publication Name: Methodsx

Publication Date: 2025-12-01

Volume: 15

Issue: Unknown

Page Range: Unknown

Description:

Image Captioning is a crucial task that enables systems to generate descriptive sentences for visual content. Though image captioning systems bloom at the intersection of Computer Vision and Natural Language Processing, these models act mostly as black boxes offering little or no insight into how captions are derived. We present a novel explainable image captioning framework that integrates a Convolutional Neural Network encoder with a Transformer decoder. Attention-based heatmaps are used to explain the visuals offering transparency in the decision making process. The method evaluates captioning quality and interpretability on the MS COCO dataset using BLEU, METEOR, CIDER and SPICE. The method enhances the trustworthiness and transparency, making it reliable for applications like healthcare, education, security, surveillance and forecasting.A reproducible method for integrating visual explainability into image captioning exploring transformer decoder attention maps.The method contributes to the growing body of eXplainable AI (XAI) by addressing the transparency gap in vision-language modelsBalance performance with interpretability paving the way for more transparent and trustworthy AI systems.

Open Access: Yes

DOI: 10.1016/j.mex.2025.103744