Robust M Estimation for Poisson Panel Data Model with Fixed Effects: Method, Algorithm, Simulation, and Application

The fixed effects Poisson (FEP) model is crucial for count data involving periods and cross-sectional units. The maximum likelihood (ML) estimation method for the FEP model performs well without outliers, but its performance degrades in their presence. Therefore, this paper introduces robust estimators for the FEP model. These estimators provide stable and reliable results even when outliers are present. A Monte Carlo simulation study and an empirical application were conducted to evaluate the performance of the non-robust fixed Poisson maximum likelihood (FPML) estimator and the robust estimators: fixed Poisson Huber (FPHR), fixed Poisson Hampel (FPHM), and fixed Poisson Tukey (FPTK). The findings from the simulation and application indicate that robust estimators outperform the FPML estimator in the presence of outliers in count panel data. Furthermore, the FPTK estimator is more efficient than the other robust estimators.


Introduction
Count data refers to observations that take nonnegative integer values, these values are generated from counting because of the occurrence of an event, for example, the number of accidents that occur on the roads, the number of patents granted to countries and companies, and the number of deaths due to a particular disease.Count data are widely used in health and economic fields, so researchers are interested in models that deal with this type of data.In these models, the count data are treated as a response variable [34].
In the econometric literature, the Poisson regression model has been widely used to analyze count data, for example, in the field of healthcare and medicine, [4] used the Poisson regression model of count to model daily death cases of COVID-19 in Nigeria.There are many studies interested in the application of the Poisson regression model in other fields, see, e.g.[21,32,35,30,29,8].
In recent years, panel data analysis has become one of the most exciting fields in the econometrics literature, where panel data aim to study the data of the time dimension and the sectoral dimension to achieve the maximum benefit from the data.According to [9], the panel dataset refers to the combining of observations on cross-section and time-series, where the cross-section is observed over several periods.
Panel data regression models have become widely used among applied researchers due to their multiple advantages compared to cross-section or time-series data models.Therefore, we will discuss one of the most 1293 important models used in panel data regression modeling, which is the fixed effects model.Many studies used the fixed effects model; see, e.g.[1,27,37,36,38,16,5].
Despite the various models for count data regression, more progress has been made in improving the count data models by using panel data regression models.In count panel data models, the dependent variable does not follow the normal distribution, where the dependent variable takes nonnegative integer values.[18] introduced a variety of econometric models to deal with count data in panel data models, where the Poisson model with fixed effects is used by using the ML estimation method.For more studies and examples on count panel data in the fields of econometrics, political, biological, and health sciences, see, for example, [39,7,28,17,23,26,22,14,15].
The existing studies for the robust estimators of the Poisson regression model are limited, where some studies deal with outliers in the generalized linear models; see [20,25,3,24,2].These studies introduced the Poisson model for the data that contain outliers.The Monte Carlo simulation was performed to compare the performance of the non-robust ML estimator with some robust estimators when the data suffer from the presence of outliers.[11] developed a robust generalized quasi-likelihood estimation method to estimate the parameters in longitudinal models for binary and count data with outliers.The simulation results show that the robust generalized quasilikelihood produces unbiased and consistent estimates compared to the classical generalized quasi-likelihood estimation method.
Although panel data regression models are important, panel data often suffer from data outliers.These outliers have adverse effects on classical estimation methods, where the studies of robust estimation methods for fixedeffects panel data regression model are a few.For studies on robust methods for the fixed effects model, see [12,6,33,13].
In the econometric literature, there is no specific robust method for estimating parameters in FEP.Therefore, in this paper, we introduce robust estimation methods for the FEP model in case the count panel data contains outliers based on the weight function for Huber, Hampel, and Tukey Bisquare.

Fixed Effects Poisson and ML Estimator
The Poisson panel model is a discrete probability distribution of the count of randomly occurring events for individuals i in the time t.The probability mass function of the FEP model is given by: where the dependent variable, or y it values are nonnegative integers, i.e., the model allows for the possibility of counts where y it ≥ 0. Based on y it , we can write the model in terms of the mean of the response as: where y it is the dependent variable for individuals i at time t, X k,it is the it th observation on k explanatory variables, α i represent constant term for cross-sections and differing from unit to unit and fixed over time.The intercept α i includes the unobserved effect for special variables to the i th individual over time, β is the vector of the regression coefficients, and u it is the error term of the model.We can use the ML estimation method to estimate the regression coefficients of the model (1).The joint probability function for the i th observation for model (1) is given by: where µ it = exp(δ i + X ′ k,it β), taking the logarithm of joint probability function in (3) and summing over all individuals, the log-likelihood function of individual i is: where T T t=1 y it , and λi = 1 T T t=1 λ it .Differentiating with respect to α i and setting to zero, we get the ML estimator for α i : With the first derivative with respect to β and setting to zero for (4), taking into consideration all N observations, we get: The ML estimates for β have no closed-form solution, so numerical search procedures are used to find the ML estimates for β in (6).

Proposed Robust Estimators
Robust regression for the Poisson panel model is an important tool for analyzing count panel data and providing good results in the presence of outliers.The ML estimates for (6) can be affected when the count panel data contains outliers.Therefore, we use the M estimation method to obtain stable estimations for the coefficients in the FEP model.[19] generalized the median to a larger class of estimators, called M estimators (or ML-type estimators).The M estimation method is based on minimizing the residuals (disturbances) function, the residuals or disturbances corresponding to the observation it th in the FEP model are: We can write the M estimator for the FEP model by minimizing the objective function ρ over all β as follows: where ρ(•) is a continuous and symmetric objective function that satisfies certain properties.Often ρ(•) can be formed by using a generalized linear combination of the residuals.
, and σ is the median absolute deviation defined by: where M is the median.The M estimator of β based on the function ρ(ξ it ).Differentiating the objective function ρ with respect to the coefficients β, and setting the partial derivatives to zero in (7), we find: where ψ(u it ) = ρ ′ (u it ) is the derivative for ρ(u it ), where ψ(u it ) is called the influence (score) function.If we define a weight function W ξ (u it ) = ψ(uit) uit , we can rewrite (8) by using the weight, and then, we can obtain the first-order condition for M estimators as following: where When solving (9) depending on the weights of Huber, Hampel, and Tukey bisquare, we can obtain three robust estimators for β in the FEP model.We can write the weight function of Huber, Hampel, and Tukey bisquare in panel data regression as follows [10]: • Huber's weight function: • Tukey's weight function: We will show the algorithm of the robust M estimation method for the FEP model, which should be used when the count panel data contain outliers to obtain a robust estimation.
1. Estimating regression coefficients for the count panel data by using the FEP model.2. Calculate initial parameters ( βF P M L ) by using ML estimation method.3. Calculate residuals value ξ it = Rit(αi,β) √ αiλit .4. Calculate median absolute deviation σi . 5. Calculate standardized residuals u it = ξit σi .6. Calculate the weight value W ξ (ξ it ).7. Estimate βF P HR , βF P HM , and βF P T K estimators based on W ξ (ξ it ).8. Repeat steps 3-6 to obtain a convergent value of βF P HR , βF P HM , and βF P T K estimators.9. Examine the significance of the independent variables on the dependent variable and compare the performance of these estimators using some criteria, for example, use: where p is the number of parameters, n denotes the total number of observations, and L denotes ML value.
It is known that Classical estimation methods are affected by outliers like ordinary least squares and ML.Therefore, robust estimation methods have been proposed to deal with outliers to obtain good and stable results in the presence of outliers.To achieve this resistance, we used the M estimation method to achieve high robustness or high efficiency for estimators, where the M estimator is more efficient than traditional estimators, see, e.g.[41,31].

Monte Carlo Simulation Study
The Monte Carlo simulation was conducted to examine the effect of outliers on estimates in the FEP model.Some studies have been relied upon when designing simulation, see [40,2].R software is used to conduct our simulation.

Algorithm of Simulation
The algorithm of the Monte Carlo simulation study for the FEP model is based on the following: 1. We design the panel data set to obtain the total number of observations (n = N × T ) with the following steps: (a) The values of cross-section (N ) were chosen to be 20, 50, 100, and 200 to represent small, moderate, and large samples for the number of individuals, respectively.(b) The values of Time-series (T ) were chosen to be 5, 10, and 20 to represent different sizes for the time period.
2. We generate count panel data as follows: (a) The vector of true parameters was chosen to be β = 1.
(b) The independent variables (X k,it ) were generated from uniform distribution on interval (−0.5, 1), where the number of independent variables was k = 3 and 6.(c) The dependent variable (y it ) was generated from the Poisson distribution with a mean equal to exp(δ i + X ′ k,it β).(d) The percentage of outliers (τ %) in the dependent variable was chosen to be 0, 5, 10, and 20.This percentage is calculated from the total number of observations (n).When the proportion of outliers equals zero (τ = 0%) this means that count panel data do not contain outliers.(e) The outliers generated from Poisson distribution with mean equal to 8 IQR[exp where IQR is the interquartile range.(f) Estimate regression coefficients: FPHR estimator ( βF P HR ), FPHM estimator ( βF P HM ), and FPTK estimator ( βF P T K ) estimator using weighted ML estimation method with weight W ξ (ξ it ).
4. For all experiments of simulation, we ran 1000 replications.5. We examine the performance of these estimators with the following steps: (a) Calculate the mean squared error (MSE) and mean absolute error (MAE) for N, T, P , and τ different for each parameter separately as follows: where βl is the vector of estimated values of β at l th experiment of 1000 replication of simulation, while β is the vector of true parameters, where better estimator is the one has small total MSE and total MAE.(b) Calculate the mean relative efficiency (MRE) of M estimators for all T and τ for each N separately to compare the performance of estimators (FPHR, FPHM, and FPTK).The MRE is calculated as: where βR represents βF P HR , βF P HM , and βF P T K estimators.The best efficient estimator is the one with the largest MRE.

Simulation Results
The results of the Monte Carlo simulation for the small, moderate, and large samples have been provided in Tables 1 to 4. Specifically, Tables 1, 2, 3, and 4 presented the total MSE and the total MAE values of all estimators (nonrobust and robust) when the number of explanatory variables equal 3 and 6, with the variation in cross-section sizes to be 20, 50, 100, and 200, and the time-series periods to be 5, 10, and 20.
According to the results of these tables, the values of MSE and MAE are increasing when both the number of explanatory variables and the percentage of outliers increase.When the proportion of outliers increases from 5% to 20%, the MSE and MAE values are inflated.But, this increase is somehow large for the FPML estimator and small for the FPHR, FPHM, and FPTK estimators.In the case of count panel data that do not contain outliers (τ = 0%), the non-robust estimator (FPML) performs better than robust estimators (FPHR, FPHM, and FPTK) for all values of N and T.
On the other hand, in general, the MSE and MAE values of estimators decrease when the cross-section and time-series increase.Where robust estimators (FPHR, FPHM, and FPTK) are better than the non-robust estimator (FPML) when the count panel data contain outliers for all different values of N and T .The rate of decrease for MSE and MAE values of robust estimators is greater compared to the non-robust estimator.
Figures 1 and 2 show the MRE for robust estimators (FPHR, FPHM, and FPTK) clustered by time-series (from 5 to 20) and percentages of outliers (from 5% to 20%) for each cross-section separately, when the number of parameters (k = 3 and 6).These figures indicate that the MRE values of the FPTK estimator are larger than the MRE values of FPHR and FPHM estimators for each cross-section value, this means that the FPTK estimator is more efficient than the FPHR and FPHM estimators in different N , T , and τ values.When N increases for each T and τ , the efficiency of FPTK increases.In Figure 2, the efficiency of the FPHM increases, but the FPTK estimator is still more efficient than the FPHM estimator.Efficiency increases when N = 200, T = 20, and τ = 20%.6 presents the results of non-robust and robust estimates for the FEP model.It can be seen that all the explanatory variables for FPHM, FPHR, and FPTK estimators have statistically significant effects on the response variable, so the estimated coefficients are suitable for the robust FEP model.In the non-robust FEP model, it can be noted that all the explanatory variables are statistically significant except X 4,it .Based on the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) presented in Figure 4, we concluded that the robust estimators FPHM, FPHR, and FPTK performed better than the non-robust estimator FPML, where the best estimator is the estimator that has a minimum value of AIC or BIC.

Conclusions
This paper presented a robust estimation method for the FEP model for analyzing count panel data with outliers.We have introduced robust estimators based on the M estimation method, including FPHM, FPHR, and FPTK, and compared these with the non-robust FPML estimator.
To examine the performance of the estimators, we conducted a Monte Carlo simulation study and a practical empirical application to patent data from high-income European countries.The simulation result shows that, in case there are no outliers, the FPML estimator is better, while the robust estimators have lower performance.In the presence of outliers, the weighted ML estimators (FPHM, FPHR, and FPTK) are more effective compared to the ML estimator.The result of the application shows that robust estimators are better than non-robust estimators with outliers.In addition, FPTK is more efficient than FPHM and FPHR.
This study is expected to provide useful information for both researchers and policy makers involved in scientific research and development.Furthermore, it improves the statistical methods used to analyze the count panel data, particularly when dealing with outliers.

Figure 1 .
Figure 1.The MRE of the Robust Estimators when k = 3.Figure2.The MRE of the Robust Estimators when k = 6.

Figure 2 .
Figure 1.The MRE of the Robust Estimators when k = 3.Figure2.The MRE of the Robust Estimators when k = 6.

Figure 4 .
Figure 4.The AIC and BIC Values for Estimators

Table 1 .
MSE and MAE Values of Estimators when N = 20.

Table 2 .
MSE and MAE Values of Estimators when N = 50.

Table 3 .
MSE and MAE Values of Estimators when N = 100.

Table 4 .
MSE and MAE Values of Estimators when N = 200.

Table 6 .
Estimates of Fixed Effects Poisson Panel Data Model Note: *, ** indicate that the level of significance is at 1% and 5%, respectively.