Logo image
Image to Label to Answer: An Efficient Framework for Enhanced Clinical Applications in Medical Visual Question Answering
Journal article   Open access   Peer reviewed

Image to Label to Answer: An Efficient Framework for Enhanced Clinical Applications in Medical Visual Question Answering

Jianfeng Wang, Kah Phooi Seng, Yi Shen, Li-Minn Ang and Difeng Huang
Electronics, Vol.13(12), pp.1-12
2024
pdf
electronics-13-02273-v21.33 MBDownloadView
Published VersionCC BY V4.0 Open Access

Abstract

medical visual question answering (Med-VQA) large language models (LLMs) multi-label learning attention mechanisms zero-shot learning
Medical Visual Question Answering (Med-VQA) faces significant limitations in application development due to sparse and challenging data acquisition. Existing approaches focus on multi-modal learning to equip models with medical image inference and natural language understanding, but this worsens data scarcity in Med-VQA, hindering clinical application and advancement. This paper proposes the ITLTA framework for Med-VQA, designed based on field requirements. ITLTA combines multi-label learning of medical images with the language understanding and reasoning capabilities of large language models (LLMs) to achieve zero-shot learning, meeting natural language module needs without end-to-end training. This approach reduces deployment costs and training data requirements, allowing LLMs to function as flexible, plug-and-play modules. To enhance multi-label classification accuracy, the framework uses external medical image data for pretraining, integrated with a joint feature and label attention mechanism. This configuration ensures robust performance and applicability, even with limited data. Additionally, the framework clarifies the decision-making process for visual labels and question prompts, enhancing the interpretability of Med-VQA. Validated on the VQA-Med 2019 dataset, our method demonstrates superior effectiveness compared to existing methods, confirming its outstanding performance for enhanced clinical applications.

Details

Metrics

1 File views/ downloads
29 Record Views
Logo image