A Hybrid Framework for Arabic Extractive Document Summarization Using Pre-Trained Language Models and Topic Modeling

Maythem Muwafeq Muhmmed

doi:10.29196/jubpas.v34i1.6379

PDF

Published: 06-04-2026

DOI: https://doi.org/10.29196/jubpas.v34i1.6379

Keywords:

Arabic NLP, Extractive Summarization, Hybrid Models

Maythem Muwafeq Muhmmed

Ministry of Health, Babel Health Directorate, Imam Sadiq Educational Hospital, Babil,Iraq.

Abstract

One of the hardest things to perform in Natural Language Processing (NLP) is automatically summarizing text. This is especially true for Arabic, which has complicated morphology, a lot of semantic information, and syntactic ambiguity. The goal of this study is to suggest a hybrid method for creating Arabic extractive summaries that combines the strengths of pre-conditioned language models (PLMs) with topic modeling techniques to create summaries that are very accurate, cover a lot of ground, and make sense semantically. The suggested model uses an Arabic pre-trained language method, like AraBERT, to get deep contextual sentence representations and figure out how they fit with the document. You can use BERTopic or Latent Dirichlet Allocation (LDA) to find out what the text is really about. This will make sure that the summary has all the important points. The system chooses the most representative sentences by combining the semantic and topical parts without making the text less clear. We check how well the suggested strategy works by using both standard automatic metrics like ROUGE-N and ROUGE-L and human evaluations of the quality and coherence of the content. The results show that the hybrid system is much better at summarizing Arabic text than just using standard math or deep learning methods. This makes it easier to find information and helps Arabic NLP applications move forward. The proposed hybrid approach achieves superior ROUGE-1, ROUGE-2, and ROUGE-L scores in an Arabic language news dataset compared to the baseline extractive analysis methods. This indicates that it makes the content more coherent and adds more of it.

Issue

Vol. 34 No. 1 (2026): Vol.34 Issue 1( 2026)

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

[1]

“A Hybrid Framework for Arabic Extractive Document Summarization Using Pre-Trained Language Models and Topic Modeling”, JUBPAS, vol. 34, no. 1, pp. 197–211, Apr. 2026, doi: 10.29196/jubpas.v34i1.6379.

Article Sidebar

Main Article Content

Abstract

Article Details

Issue

Section

How to Cite

Similar Articles