Deep Learning Approaches for Image Captioning: Opportunities, Challenges and Future Potential

Jamil, Azhar and Rehman, Saif Ur and Mahmood, Khalid and Gracia Villar, Mónica and Prola, Thomas and Diez, Isabel De La Torre and Samad, Md Abdus and Ashraf, Imran UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, monica.gracia@uneatlantico.es, thomas.prola@uneatlantico.es, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED (2024) Deep Learning Approaches for Image Captioning: Opportunities, Challenges and Future Potential. IEEE Access. p. 1. ISSN 2169-3536

[img]
Preview
Text
Deep_Learning_Approaches_for_Image_Captioning_Opportunities_Challenges_and_Future_Potential.pdf
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview

Abstract

Generative intelligence relies heavily on the integration of vision and language. Much of the research has focused on image captioning, which involves describing images with meaningful sentences. Typically, when generating sentences that describe the visual content, a language model and a vision encoder are commonly employed. Because of the incorporation of object areas, properties, multi-modal connections, attentive techniques, and early fusion approaches like bidirectional encoder representations from transformers (BERT), these components have experienced substantial advancements over the years. This research offers a reference to the body of literature, identifies emerging trends in an area that blends computer vision as well as natural language processing in order to maximize their complementary effects, and identifies the most significant technological improvements in architectures employed for image captioning. It also discusses various problem variants and open challenges. This comparison allows for an objective assessment of different techniques, architectures, and training strategies by identifying the most significant technical innovations, and offers valuable insights into the current landscape of image captioning research.

Item Type: Article
Uncontrolled Keywords: Image captioning, deep learning, image processing, artificial intelligence
Subjects: Subjects > Engineering
Divisions: Europe University of Atlantic > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Universidad Internacional do Cuanza > Research > Scientific Production
University of La Romana > Research > Scientific Production
Depositing User: Sr Bibliotecario
Date Deposited: 29 Feb 2024 13:09
Last Modified: 29 Feb 2024 13:09
URI: http://repositorio.funiber.org/id/eprint/11065

Actions (login required)

View Item View Item