Dual-modality fusion for mango disease classification using dynamic attention based ensemble of leaf & fruit images

Mohsin, Muhammad and Hashmi, Muhammad Shadab Alam and Delgado Noya, Irene and Garay, Helena and Abdel Samee, Nagwan and Ashraf, Imran UNSPECIFIED, UNSPECIFIED, irene.delgado@uneatlantico.es, helena.garay@uneatlantico.es, UNSPECIFIED, UNSPECIFIED (2025) Dual-modality fusion for mango disease classification using dynamic attention based ensemble of leaf & fruit images. Scientific Reports, 15 (1). ISSN 2045-2322

[img] Text
s41598-025-26052-7.pdf
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB)

Abstract

Mango is one of the most beloved fruits and plays an indispensable role in the agricultural economies of many tropical countries like Pakistan, India, and other Southeast Asian countries. Similar to other fruits, mango cultivation is also threatened by various diseases, including Anthracnose and Red Rust. Although farmers try to mitigate such situations on time, early and accurate detection of mango diseases remains challenging due to multiple factors, such as limited understanding of disease diversity, similarity in symptoms, and frequent misclassification. To avoid such instances, this study proposes a multimodal deep learning framework that leverages both leaf and fruit images to improve classification performance and generalization. Individual CNN-based pre-trained models, including ResNet-50, MobileNetV2, EfficientNet-B0, and ConvNeXt, were trained separately on curated datasets of mango leaf and fruit diseases. A novel Modality Attention Fusion (MAF) mechanism was introduced to dynamically weight and combine predictions from both modalities based on their discriminative strength, as some diseases are more prominent on leaves than on fruits, and vice versa. To address overfitting and improve generalization, a class-aware augmentation pipeline was integrated, which performs augmentation according to the specific characteristics of each class. The proposed attention-based fusion strategy significantly outperformed individual models and static fusion approaches, achieving a test accuracy of 99.08%, an F1 score of 99.03%, and a perfect ROC-AUC of 99.96% using EfficientNet-B0 as the base. To evaluate the model’s real-world applicability, an interactive web application was developed using the Django framework and evaluated through out-of-distribution (OOD) testing on diverse mango samples collected from public sources. These findings underline the importance of combining visual cues from multiple organs of plants and adapting model attention to contextual features for real-world agricultural diagnostics.

Item Type: Article
Uncontrolled Keywords: Plant disease detection Multimodal approach Class-aware augmentation Modality attention fusion Out-of-distribution
Subjects: Subjects > Nutrition
Divisions: Europe University of Atlantic > Research > Scientific Production
Fundación Universitaria Internacional de Colombia > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Universidad Internacional do Cuanza > Research > Scientific Production
University of La Romana > Research > Scientific Production
Depositing User: Sr Bibliotecario
Date Deposited: 09 Dec 2025 11:04
Last Modified: 09 Dec 2025 11:04
URI: http://repositorio.funiber.org/id/eprint/17885

Actions (login required)

View Item View Item