Statistics, Optimization & Information Computing

The Exponentiated-Gompertz-Marshall-Olkin-G Family of Distributions: Properties and Applications

Oarabille Lekhane — 2025-02-24

A new generalized family of distributions referred to as Exponentiated-Gompertz-Marshall-Olkin-G (EGom-MO-G) distribution is introduced. The distribution can be expressed as an infinite linear combination of the exponentiated-G family of distributions. Some mathematical properties are derived and studied. Several estimation techniques including maximum likelihood estimation, Cram\'{e}r-von Mises, least squares estimation, weighted least squares, Anderson-Darling and right-tail Anderson-Darling methods are compared. A special case of the new family of distributions is adopted for application to two real data sets and compared to some existing models. Results revealed that the new family of distributions is superior than compared models.

A Bayesian method for estimation of the entropy in the presence of outliers based on the contaminated Pareto model

Mahsa Salajegheh — 2025-02-22

Shannon entropy and Fisher information are pivotal in the information theory area. The presence of outliers in data and using an inappropriate model may cause misleading inferential results in the amount of information. Our aim of this paper is to compute the amount of Shannon entropy and Fisher information that exists in the Pareto distribution in the presence of multiple outliers. Unlike the existing methods in the literature, we present a good method for the estimation of Shannon entropy and Fisher information to cope with the allowing for the possibility of outliers. In this regard, we focus on the Bayesian approach proposed by D. P. Scollnik (A Pareto scale-inflated outlier model and its Bayesian analysis, Scandinavian Actuarial Journal, vol. 2015, no. 3, pp.
201–220) , based on the contaminated Pareto distribution. We implement the Gibbs sampler which is a simple and rational method for computing Bayesian estimation of Shannon entropy and Fisher information under different loss functions. Some simulation studies are conducted to investigate the performance of the proposed methodology under various sample sizes and the number of outliers. In the end, two examples of real insurance claim data are studied to illustrate the superiority of the proposed model in analyzing datasets and computing the amount of Shannon entropy and Fisher information.

A novel estimator of Kullback-Leibler information with its application to goodness of fit tests

Sayed Qasim Alavi — 2025-01-07

In this study, our primary focus is on introducing a new estimator of Kullback-Leibler information, which we subsequently utilize for conducting goodness-of-fit tests. We employ this new estimator to propose tests specifically tailored for assessing the fit of data to the normal, exponential, and Weibull distributions. To ensure the reliability and accuracy of our proposed tests, we utilize a Monte Carlo simulation approach. Through this simulation, we obtain percentile points and determine the type I error rates of the tests. This enables us to assess the performance and suitability of the proposed tests under different scenarios. Furthermore, we conduct a comprehensive simulation study to evaluate the power and effectiveness of our proposed tests. We compare their performance with that of other competing tests, allowing us to gauge their relative strengths and weaknesses. To demonstrate the practical applicability of the proposed tests, we provide three real data examples. These examples serve as illustrations of how the tests can be implemented and offer insights into their performance when applied to real-world data. By combining theoretical developments, simulation studies, and real data analysis, we aim to provide a comprehensive evaluation of the proposed goodness-of-fit tests based on the new estimator of Kullback-Leibler information.

Robust Lasso Estimator for the Liu-Type regression model and its applications

Tarek Omara — 2025-02-15

In this paper, we propose a new estimator for Liu-Type regression model, called the LAD-Lasso-Liu estimator, which addresses the issues of multicollinearity, outliers and it performs the variable selection. By combining the LAD Lasso and Liu-Type estimators, our proposed estimator achieves double shrinkage for the parameters and at the same time it has the robust properties. We thoroughly discuss the properties of the new estimator and conduct a simulation study to demonstrate its superiority over the LAD, Lasso, and LAD-Lasso estimators. We used the Median(MSE) as a criteria to compare between the estimators at a different factors. The simulation results showed that the proposed estimator has superiority over the other estimators especially when the correlation coefficient between the explanatory variables increases and when the error variance decreases. In addition, the proposed estimator has better correct selection of the number of zeros coefficients than other penalized estimators. To demonstrate the work of the estimator presented under empirical data, we apply the proposed estimator to prostate cancer data. Our results for the empirical data indicate that the proposed estimator outperforms the other estimators and can provide accurate results in challenging scenarios with multicollinearity and outliers.

Acquiring Independent Components through Hybrid PCA and ICA to Enhance the Classification Performance of Decision Tree

Achmad Efendi — 2025-02-14

The Principal Component Analysis (PCA) is widely used for modeling in both statistical and machine learning domains. However, PCA's orthogonal components may not always be independent. This research aims to compare PCA and Independent Component Analysis (ICA) using simulation and empirical data and evaluate a Decision Tree (DT) model. Two scenarios of simulation data with linear and nonlinear relationships, along with two empirical datasets were analyzed. PCA was used to project the dataset, while ICA was applied to the 6th to 10th and the 5th to 9th principal components. Both PCA and ICA resulted in projection data with zero correlation values. Scatter plots of PCA projection on nonlinear simulation data indicated consistent underlying patterns, whereas ICA projection revealed sparse patterns on both simulation datasets. The DT model utilizing 7 independent components emerged as the optimal model, displaying superior performance across accuracy, precision, recall, F1 score, Mathew's Correlation Coefficient, and Area Under Curve metrics.

Enhancing Overlap Measures in Censored Data Analysis: A Focus on Pareto Distributions and Adaptive Type-II Censoring

Amal Helu — 2025-02-20

This article explores the application of the adaptive type-II progressive hybrid censoring scheme to calculate three widely recognized statistical measures of overlap for two Pareto distributions with distinct parameters. By utilizing this innovative censoring technique, we derive estimators for these measures and provide their asymptotic bias and variance. When small sample sizes pose challenges in assessing estimator precision or bias due to the lack of closed-form expressions for variances and exact sampling distributions, Monte Carlo simulations are employed to enhance reliability. Additionally, we construct confidence intervals for these measures using both the bootstrap method and Taylor approximation. Our approach offers improved accuracy and efficiency in estimating overlap measures, addressing key challenges in censored data analysis. To demonstrate the practical relevance of our proposed estimators, we include an illustrative application involving real data from an Iranian insurance company.

Optimal Tests for Distinguishing PAR Models from PSETAR Models

Nesrine Bezziche — 2025-02-06

This paper aims to detect nonlinearity in periodic autoregressive models. We introduce parametric and semiparametric local asymptotic optimal tests designed for distinguishing a periodic autoregressive model from a periodic self-exciting threshold autoregressive (SETAR) model. Leveraging the Local Asymptotic Normality (LAN) property specific to periodic SETAR models, we devise a parametric test that is locally asymptotically most stringent. Additionally, the utilization of kernel estimation for the density function allows the construction of an adaptive test for enhanced flexibility and accuracy in detecting nonlinearity.

The Survival Power Weibull Distribution With Application

Hazim Kalt — 2025-02-24

The primary objective of this research is to investigate a novel lifetime distribution characterized by three parameters, which is constructed through the amalgamation of the Weibull distribution and the Survival Powe-G family. The recently introduced model is referred to as the SPW distribution. The newly formulated distribution possesses the advantage of effectively modeling various data types, thus proving to be instrumental in the domains of reliability and lifespan statistics. Several statistical properties pertinent to the SPW distribution are examined in this study. The recommended estimation approach is the maximum likelihood method. Empirical tests of the SPW distribution are presented by using two real datasets. Furthermore, SPW distribution demonstrates a good fit, backed by comparisons with Weibull-based models and other alternative distributions using several goodness-of-fit assessments.

Multivariate Time Series Analysis of the USD/IQD Exchange Rate Using VAR, SVAR, and SVECM Models

Diana Hana — 2025-02-13

This study examines the USD/IQD exchange rate using multivariate time series models. We implement vector autoregressive (VAR), structural VAR (SVAR), and structural vector error correction (SVEC) models using the 'vars' package in R. The analysis includes diagnostic testing, a constrained model estimation, prediction, causality analysis, impulse response functions, and forecast error variance decomposition. Variables are selected using the Granger causality test, leading to various model combinations. Model 3, which includes USD, gold, and copper, is identified as optimal for accurate forecasting. Although the oil variable has a high p-value (0.4674), its inclusion is justified based on economic intuition and statistical reasoning, given its influence on exchange rates and commodity prices that is crucial for making good investment decisions.

Enhancing IoT Systems with Bio-Inspired Intelligence in fog computing environments

Islam S. Fathi — 2025-02-20

In the context of deploying delay-sensitive Internet of Things (IoT) applications, the fog computing paradigm faces critical challenges in optimal node placement to ensure both connectivity and coverage. Traditional optimization approaches often fail to effectively balance these competing objectives in dynamic IoT environments. This paper presents a novel bio-inspired approach using the Pufferfish Optimization Algorithm (POA) to solve this multi-objective optimization problem. Our solution uniquely leverages a two-stage optimization process, inspired by pufferfish predator response mechanisms, to dynamically balance global exploration and local exploitation. Experimental evaluation across varied network configurations demonstrates that POA significantly outperforms existing state-of-the-art algorithms in both network connectivity and coverage metrics (p < 0.001). The proposed algorithm exhibited robust performance across different communication ranges, while maintaining optimal network connectivity and comprehensive coverage of edge devices. These results demonstrate POA's effectiveness of the POA in optimizing fog node placement for real-world IoT applications.

Advanced Sech-Tanh Optimization Algorithm for Optimal Sizing and Placement of PV Systems and D-STATCOMs in Distribution Networks

Oscar Danilo Montoya Giraldo — 2025-01-30

This paper presents an innovative approach to addressing the simultaneous integration of photovoltaic (PV) systems and distribution static compensators (D-STATCOMs) within distribution networks by employing the Sech-Tanh Optimization Algorithm (STOA). The proposed STOA method uses a discrete-continuous codification scheme to determine the optimal placement (nodes) and sizing (capacities) of PV sources and D-STATCOMs. A power flow analysis based on the successive approximations method is implemented to resolve the power flow equations, evaluating technical parameters such as voltage profiles and power injections. The problem is tackled through a master-slave optimization strategy, wherein the STOA is integrated with the power flow approach to obtain optimal solutions. The performance of this methodology is validated using 33- and 69-bus systems, showing notable improvements over traditional optimization techniques like the vortex search algorithm (VSA) and the sine-cosine algorithm (SCA). The results highlight a reduction of approximately 35.548% and 35.6801% in the objective function when applying STOA across both test systems. Furthermore, STOA exhibits a reduced computational effort in comparison to VSA and SCA, confirming its effectiveness. All numerical analyses were conducted using MATLAB 2024a.

Cryptocurrency Price Prediction with Genetic Algorithm-based Hyperparameter Optimization

Nasreddine Hafidi — 2025-02-22

Accurate cryptocurrency price forecasting is crucial for investors and researchers in the dynamic and unpredictable cryptocurrency market. Existing models face challenges in incorporating various cryptocurrencies and determining the most effective hyperparameters, leading to reduced forecast accuracy. This study introduces an innovative approach that automates hyperparameter selection, improving accuracy by uncovering complex interconnections among cryptocurrencies. Our methodology leverages deep learning techniques, particularly Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks, in conjunction with the Genetic Algorithm (GA) to optimize hyperparameters. We propose and compare two architectures, LAO and LOEE, utilizing these methods to enhance forecast accuracy and address the challenges of the cryptocurrency market. This cutting-edge approach not only improves forecasting capabilities but also provides valuable insights for managing cryptocurrency investments and conducting research. By automating hyperparameter selection and considering interconnections between cryptocurrencies, our approach offers a practical solution for accurate cryptocurrency price prediction in a dynamic market environment, benefiting both investors and academics.

Convolutional Neural Networks for Advanced Sales Forecasting in Dynamic Market Environments

Raouya El Youbi — 2025-01-02

This paper presents an enhanced approach to sales forecasting using advanced hybrid deep learning architectures, specifically Convolutional Neural Networks (CNNs) combined with Residual Networks (ResNets) and Temporal Convolutional Networks (TCNs). Utilizing the “Store Item Demand Forecasting Challenge" dataset, the study demonstrates significant improvements in forecasting accuracy over traditional models. The enhanced CNN-TCN model achieved the lowest Mean Absolute Percentage Error (MAPE) of 2.0% and the highest Prediction Interval Coverage Probability (PICP) of 96%. These results highlight the potential of hybrid architectures to provide more reliable and precise sales forecasts, offering valuable insights for business decision-making and strategic planning.

A Robust Algorithm for Asymmetric Cryptography Using Rainbow Vertex Antimagic Coloring

Kiswara Agung Santoso — 2025-01-26

Cryptography plays a crucial role in securing information and communications in the face of advancing technologies. Asymmetric encryption, also known as public-key cryptography, plays a crucial role in cryptography. Unlike symmetric encryption, which uses a single key for both encryption and decryption, asymmetric encryption involves a pair of keys, namely a public key and a private key. Asymmetric cryptography is closely associated with the secure management of keys, addresses, and transactions within the blockchain ecosystem, especially in cryptocurrency platform. In this study,
we present a novel concept known as rainbow vertex antimagic coloring. This concept extends the idea of rainbow vertex coloring by incorporating antimagic labeling. Let f : E(G) → {1, 2, . . . , |E(G)|} be a function, where the weight of a vertex v ∈ V (G) with respect to f is defined as wf (v) = Σe∈E(v) f(e). Here, E(v) denotes the set of edges incident to v. The
function f is termed a vertex antimagic edge labeling if it assigns distinct weights to each vertex. A path is termed a rainbow path if, for any vertices u and v, all internal vertices on the u − v path have distinct weights. The rainbow vertex
antimagic connection number of a graph G, denoted by rvac(G), is defined as the minimum number of colors required
in any rainbow coloring derived from rainbow vertex antimagic labelings of G. In this paper, we will obtain some new
lemmas or theorems concerning rvac(G), and we will implement the obtained lemmas or theorems of RVAC on asymmetric cryptography technique.

Driver Behavior Classification: A Novel Approach Using Auto-Encoders and Motif Extraction

Rabab Gamal — 2025-02-15

Driver’s behavior is expressed by the intentional or unintentional actions the driver performs while driving a motor vehicle. This behavior could be influenced by several factors such as driver’s fatigue, drowsiness, vehicle surroundings, or distraction state. Monitoring, analyzing and improving driver’s behavior can reduce traffic collisions and enhance road safety. Several approaches have been followed for the detection and identification of driver’s behavior. Conventional time-series analysis applies forecasting analysis methods for driver’s behavior detection, assuming that data are stationary and ergodic; otherwise data preprocessing is mandatory. Rule-based and deep learning approaches have succeeded to mine dynamical characteristics of driving time series data. However, they have some challenges, including the selection of efficient architectures and corresponding hyper-parameters, as well as slow training and limited labeled data. In this study, we propose a motif-based approach for categorizing driver behavior as normal or abnormal, using the UAH-DriveSet dataset. Our methodology entails the selection of relevant features, which are encoded using an auto-encoder model, followed by the conversion of the encoded data into an alphabet representation through quantization. Unique patterns of varying lengths are then extracted, and an AdaBoost classifier is utilized for behavior classification. Extracted motifs capture significant patterns, which enables to achieve higher accuracy in classification. The obtained results demonstrate the effectiveness of the proposed approach in accurately categorizing driver behavior, which can significantly contribute to the advancement of intelligent transportation systems and the enhancement of road safety.

Analysis of uncertainty in the Leontief model by interval arithmetic

Benhari Mohamed Amine — 2025-02-07

This paper presents an innovative strategy to enhance the precision of economic projections through the integration of interval arithmetic into the Leontief model. We emphasise the utilisation of the Gauss-Seidel method for solving linear systems with interval coefficients. In this paper, we present a method that use the Gauss-Seidel approach to effectively solve linear systems consisting of interval coefficients. This technique enhances traditional methods by incorporating potential value intervals, in addition to exact numerical values. The result is a more precise reflection of uncertainty and a more accurate calculation of solution intervals for economic variables. We have implemented this approach in the Moroccan economic context and the Washington state context using the Gauss-Seidel method to solve linear systems with interval coefficients. Based on real economic data, we have demonstrated how this technique can have a positive impact on the accuracy of output and sensitivity evaluations in the Leontief model.

Geometric Feature-Based Machine Learning for Efficient Hand Sign Gesture Recognition

Chraa Mesbahi Soukaina — 2025-02-13

Hand Gesture Recognition (HGR) is emerging as a vital tool in enhancing communication, particularly for individuals who are deaf or hard of hearing. Despite its potential, widespread use of sign language remains constrained by limited understanding among the general public. Previous research has explored various models to bridge this communication gap. However, deploying complex deep learning algorithms on low-power, cost-effective embedded devices presents significant challenges due to constraints on memory and energy resources. In this research, we introduce a new approach by leveraging lightweight machine learning algorithms for real-time hand sign recognition, utilizing novel geometrical features derived from hand landmarks. Our approach optimizes computational efficiency without compromising accuracy, making it suitable for resource-limited devices. The proposed model not only achieves higher accuracy compared to existing methods but also demonstrates that a focus on feature design can outperform more complex deep learning architectures, thereby offering a promising solution for real-time, accessible HGR applications.

A Comprehensive Trust Evaluation Model for Financial Service Providers Using Fuzzy Inference Systems

Alaa Reda — 2025-01-31

This paper presents a hybrid trust evaluation model for financial service providers based on fuzzy inference
systems (FIS) and machine learning methods. The proposed model aggregates FSLA compliance measures, operational performance information, and user feedback to calculate dynamic, multidimensional trust scores. The model utilizes both the strengths of fuzzy logic in handling uncertainty and ambiguity, as well as the predictive power and real-time robustness of machine learning. The effectiveness of this hybrid method in overcoming the constraints of existing trust evaluation frameworks was demonstrated by the results, such as their static style, reliance on subjective evaluations, and lack of iintegration across crucial variables. Moreover, the quantitative evaluation indicated good accuracy, precision, and recall highlighting the model’s reliability and practical application. The suggested framework can evolve into a more versatile and powerful instrument for trust evaluation, thereby enhancing its contributions to the financial industry and beyond.

Intelligent Decision Making and Knowledge Management System for Agile Project Management in Industry 4.0 context

Asmae ABADI — 2025-02-13

This paper presents a smart, knowledge-driven system designed to optimize Agile Project Management (APM) processes, particularly for Industry 4.0 applications. By formalizing key concepts and integrating Rule Based Reasoning using Semantic Web Rule Language (SWRL) inference rules, the proposed APM ontology offers a robust framework for projects 4.0 knowledge management, interoperability, and decision support. The proposed ontology-based system lies in its capacity to integrate data from external systems, enabling holistic optimization and supporting intelligent decision-making. The system enhances task prioritization, resource allocation, and sprint planning by leveraging reasoning capabilities to streamline project workflows and reduce redundancy. Its implementation in a real-world cobot integration project demonstrated its ability to align tasks with project objectives, optimize resource utilization, and ensure efficient project execution.

Mathematical Modeling and Numerical Simulation of Central Retinal Artery Occlusion (CRAO) with Asymmetric Stenosis

Arif Fatahillah — 2025-02-22

This study focuses on the analysis of arterial blood flow through a stenotic region with asymmetric stenosis in the case of Central Retinal Artery Occlusion (CRAO). Three different stenotic shapes are examined in this research: Bell-Cosine shape, Cosine-Elliptical shape, and Bell-Elliptical shape. The present study marks the first initiative to perform numerical analysis on three stenotic shapes. The finite volume method was employed to construct the blood flow model, which was solved using the SIMPLE algorithm. To support the implementation of this model, several simulation comparisons were made with existing models, and computational results for various parameter values were presented as contours and graphs. Numerical simulations were conducted to investigate the influence of the stenosis shape, position, and thickness on the velocity and pressure in the retinal blood vessels. High flow profiles with minor disturbances are observed near the origin of the stenotic artery for all geometric shapes. The pressure profile also shows elevated values near the sharp edges of the stenotic region. Among the three models, the Cosine-Elliptical shape yields the highest flow velocity, while the Bell-Cosine shape produces the highest pressure and wall shear stress. The validation of MATLAB results using ANSYS confirmed the accuracy of the model, ensuring that the MATLAB simulation is reliable for further studies.

Variable Selection in Weibull Accelerated Survival Model Based on Chaotic Sand Cat Swarm Algorithm

Ahmed Naziyah Alkhateeb — 2025-04-07

In medical research, proportional hazard models are much more common, but accelerated failure time (AFT) models are still widely used. Variables in the AFT model influence the event time by altering the logarithm of the dependent variable's survival time. The parametric forms typically utilized by AFT models are restricted and cannot be represented otherwise. The selection of variables and parameter estimation for the Weibull distribution is a common practice. This predictive approach is often applied in reliability studies in engineering and medical forecasts, particularly for estimating survival time. Additionally, we present an empirical example using our prediction method on a publicly accessible dataset. Sand cat swarm optimization (SCSO) is a new metaheuristic algorithm that mimics the survival behavior of sand cats. The results reveal that SCSO outperforms other methods in terms of convergence speed and finds all or most local/global optima. The SCSO algorithm is introduced to identify critical variables in the Weibull AFT regression model. Thus, variations of the SCSO algorithm can be applied to address the Weibull AFT problem.

Proximal Alternating Linearized Minimization Algorithm for Sparse Tensor Train Decomposition

Zhenlong Hu — 2025-03-10

We address the sparse tensor train (TT) decomposition problem by incorporating an L1-norm regularization term. To improve numerical stability, orthogonality constraints are imposed on the problem. The tensor is expressed in the TT format, and the proximal alternating linearized minimization (PALM) algorithm is employed to solve the problem. Furthermore, we verify that the objective function qualifies as a Kurdyka-Lojasiewicz (KL) function and provide a convergence analysis. Numerical experiments on synthetic data and real data also demonstrate the efficiency of the proposed algorithm.

Modeling Medical and Reliability Data Sets Using a Novel Reciprocal Weibull Distribution: Estimation Methods and Sequential Sampling Plan Based on Truncated Life Testing

Mohamed Ibrahim — 2025-03-30

An extended version of the reciprocal Weibull model is proposed and thoroughly analyzed. Key statistical properties of the model are derived, and various estimation techniques are employed to estimate the unknown
parameters. A simulation study is conducted to evaluate the performance of these methods. Additionally, two real-world datasets are utilized to compare the effectiveness of the competing estimation methods. The significance of the proposed model is highlighted through these applications, demonstrating its superior performance over other competing models in fitting the datasets. A sequential sampling plan based on truncated life testing is introduced, leveraging a newly developed probabilistic model to enhance quality control decisions. The plan determines the acceptance or rejection of a lot based on life test outcomes within a specified truncation time, optimizing inspection efforts. A series of numerical experiments are conducted to validate the proposed approach, demonstrating its effectiveness in minimizing sample sizes while maintaining desired risk levels. The results highlight the impact of key parameters on the sampling process, ensuring a balance between producer and consumer risks.

Estimating Stress-Strength Reliability in the Beta-Pareto Distribution Using Ranked Set Sampling

Ali Jaleel Najm — 2025-03-30

This paper introduces a novel approach for estimating the stress-strength reliability in the beta-pareto ($BP$) distribution by employing ranked set sampling ($RSS$). Stress-strength reliability is a crucial measure that quantifies the probability of an item or system operating without failure under random stress and strength conditions. The study focuses on estimating the reliability function ($R(t)$) and the probability ($P$) of stress being lower than strength when both stress and strength variables follow independent random variables from the $BP$ distribution. The maximum likelihood $ML$ estimator of $R(t)$ and $P$ is obtained, and its performance is compared with the estimator based on simple random sampling ($SRS$). The proposed methodology is evaluated using real data from the Wheaton River experiment, showcasing its practical applicability and effectiveness. The findings highlight the superiority of our approach in accurately estimating stress-strength reliability in the $BP$ distribution, providing valuable insights for various fields such as engineering, finance, and risk analysis.

A Machine Learning Framework for Identifying Sources of AI-Generated Text

Md. Sadiq Iqbal — 2025-04-23

The rise of AI-generated text requires efficient identification methods to ascertain its origin. This research presents a comprehensive dataset derived from responses to various questions posed to AI models including ChatGPT, Gemini, DeepAI, and Bing, alongside human respondents. We meticulously preprocessed the dataset and utilized both manual methods such as Count Vector (CV), Bag of Words (BoW), and Hashing Vectorization (HV), as well as automated Deep Learning (DL) models like Bidirectional Encoder Representations from Transformers (BERT), Extreme Language understanding Network (XLNet), Enhanced Representation through Knowledge Integration (ERNIE), and Generative Pre-Trained Transformers (GPT) to convert text into features. These features are then used to train multiple Machine Learning (ML) classifiers, including Support Vector Machines (SVM), Logistic Regression (LR), Decision Trees (DT), Random Forests (RF), Naive Bayes (NB), and Extreme Gradient Boosting (XGB). This research also uses Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) to maximize the classification accuracy of ML models. Remarkably, the combination of HV with LDA and XGB achieved the highest accuracy of 99.4%. Further evaluation using precision, recall, f1 score, specificity with Confusion Matrix (CM) and Receiver operating characteristic (ROC) Curve confirmed its superior performance, while Explainable Artificial Intelligence (XAI) tools such as Shapley Additive Explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) techniques are employed to explain the model's outputs, ensuring transparency and interpretability.