Large Language Models and Multimodal AI

Gimah Mathew

doi:10.64235/dgd80j16

Published: 2025-06-30

DOI: https://doi.org/10.64235/dgd80j16

Keywords:

Large Language Models, Multimodal AI, Cross-Modal Learning, Transformers, Context-Aware AI, Image-Text Models, Conversational AI, Visual Question Answering, AI Alignment, Ethical AI.

Issue

Vol. 1 No. 02 (2025): Journal of Data Analysis and Critical Management

Section

Articles

Gimah Mathew

Ladoke Akintola University of Technology

Abstract

Large Language Models (LLMs) have transformed natural language processing by demonstrating remarkable capabilities in text understanding, generation, and reasoning. Recent advances extend these models into multimodal AI, enabling the integration of multiple data modalities—such as text, images, audio, and video—into unified learning frameworks. Multimodal AI systems leverage LLMs to process and correlate information across modalities, enhancing context understanding, task flexibility, and human–computer interaction. These models find applications in image captioning, visual question answering, video summarization, conversational AI, and cross-modal retrieval. Despite their promise, challenges such as high computational requirements, alignment of heterogeneous modalities, interpretability, and ethical concerns remain. This paper explores the architecture, capabilities, applications, and limitations of LLMs and multimodal AI, highlighting their potential to enable more robust, context-aware, and interactive artificial intelligence systems.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

How to Cite

Large Language Models and Multimodal AI. (2025). Journal of Data Analysis and Critical Management, 1(02), 108-115. https://doi.org/10.64235/dgd80j16

Article Sidebar

Issue

Section

Main Article Content

Abstract

Article Details

How to Cite