Rajesh Kumar Dhanaraj
60250668400
Publications - 2
A vision explainability method for image captioning using transformer decoder attention maps
Publication Name: Methodsx
Publication Date: 2025-12-01
Volume: 15
Issue: Unknown
Page Range: Unknown
Description:
Image Captioning is a crucial task that enables systems to generate descriptive sentences for visual content. Though image captioning systems bloom at the intersection of Computer Vision and Natural Language Processing, these models act mostly as black boxes offering little or no insight into how captions are derived. We present a novel explainable image captioning framework that integrates a Convolutional Neural Network encoder with a Transformer decoder. Attention-based heatmaps are used to explain the visuals offering transparency in the decision making process. The method evaluates captioning quality and interpretability on the MS COCO dataset using BLEU, METEOR, CIDER and SPICE. The method enhances the trustworthiness and transparency, making it reliable for applications like healthcare, education, security, surveillance and forecasting.A reproducible method for integrating visual explainability into image captioning exploring transformer decoder attention maps.The method contributes to the growing body of eXplainable AI (XAI) by addressing the transparency gap in vision-language modelsBalance performance with interpretability paving the way for more transparent and trustworthy AI systems.
Open Access: Yes
Spectral-aware CNN with learnable biorthogonal units and depthwise convolutions for multi-class blood cell classification
Publication Name: Methodsx
Publication Date: 2025-12-01
Volume: 15
Issue: Unknown
Page Range: Unknown
Description:
For effective and early diagnosis of diseases such as leukemia and anemia, accurate classification and interpretation of peripheral blood cells are critical. A novel hybrid deep learning model is proposed in this study for multi-class blood cell classification, called Spectral-Aware CNN with Learnable Spectral Biorthogonal Downsampling Units (LSBDUs) and Depthwise Separable Convolutions. The model replaces conventional pooling layers with wavelet-inspired LSBDUs for improved feature retention. This results in reduced computational overhead through efficient separable convolutions. The research used a balanced dataset of 17,092 images across eight blood cell classes. The techniques, such as stratified data splitting, advanced augmentation, and label smoothing, are included in the training pipeline for improving generalizability. As a result, the model achieves 99.18 % of overall classification accuracy with superior class-wise performance. • Replaces pooling layers with spectral-aware LSBDU blocks for better feature preservation. • Integrates Depthwise Separable Convolutions to reduce parameter count and training cost. • Demonstrates superior generalization across all classes without overfitting.
Open Access: Yes