Indonesian News Extractive Summarization using Lexrank and YAKE Algorithm

  • Julyanto Wijaya Student
  • Abba Suganda Girsang

Abstract

The surge in global technological advancements has led to an unprecedented volume of information sharingacross diverse platforms. This information, easily accessible through browsers, has created an overload, making itchallenging for individuals to efficiently extract essential content. In response, this paper proposes a hybrid AutomaticText Summarization (ATS) method, combining LexRank and YAKE algorithms. LexRank determines sentence scores,while YAKE calculates individual word scores, collectively enhancing summarization accuracy. Leveraging an unsupervisedlearning approach, the hybrid model demonstrates a 2% improvement over its base model. To validate the effectiveness of theproposed method, the paper utilizes 5000 Indonesian news articles from the Indosum dataset. Ground-truth summaries areemployed, with the objective of condensing each article to 30% of its content. The algorithmic approach and experimentalresults are presented, offering a promising solution to information overload. Notably, the results reveal a two percentimprovement in the Rouge-1 and Rouge-2 scores, along with a one percent enhancement in the Rouge-L score. Thesefindings underscore the potential of incorporating a keyword score to enhance the overall accuracy of the summariesgenerated by LexRank. Despite the absence of a machine learning model in this experiment, the unsupervised learningand heuristic approach suggest broader applications on a global scale. A comparative analysis with other state-of-the-art textsummarization methods or hybrid approaches will be essential to gauge its overall effectiveness.
Published
2024-06-07
How to Cite
Wijaya, J., & Suganda Girsang, A. (2024). Indonesian News Extractive Summarization using Lexrank and YAKE Algorithm. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-1976
Section
Scientific Report