Bayesian Estimation of The Ex-Gaussian Distribution

Fitting of the exponential modified Gaussian distribution to model reaction times and drawing conclusions from its estimated parameter values is one of the most popular method used in psychology. This paper aims to develop a Bayesian approach to estimate the parameters of the ex-Gaussian distribution. Since the chosen priors yield to posterior densities that are not of known form and that they are not always log-concave, we suggest to use the adaptive rejection Metropolis sampling method. Applications on simulated data and on real data are provided to compare this method to the standard maximum likelihood estimation method as well as the quantile maximum likelihood estimation. Results shows the effectiveness of the proposed Bayesian method by computing the root mean square error of the estimated parameters using the three methods.


Introduction
One of the most commonly used measure in scientific psychology is reaction time (RT) ( [25]; [26]), i.e., the time taken by a human being to make a response to a stimulus. The scientific psychology interest on this measure stems from the fact that it allows to explore human cognition thanks to the experimental manipulation of the characteristics of the stimuli used (see [5]). It is generally accepted that reaction times reflects the length of the time taken to realise three activities: perception, cognitive processing and the preparation of motor response. Assume that a psychologist makes the assumption that words of a language are organized in memory according to their frequency of occurrence. The experimental manipulation of this factor should have an impact on cognitive processes. In particular, a human being should be more efficient and so faster to process high frequency words (e.g., maison [house]) than low frequency words (e.g., ornithorynque, [platypus]). In an experimental task, (e.g. handwrite on a graphic tablet a word as soon as it is auditory presented) it should be reflected by shorter reaction times for the first type of item compared to the second type. Durations necessary to perceive the stimulus and to prepare the motor responses being constant, the comparison of reaction times between lexical frequency conditions allows researchers to test the hypothesis of a word organization in memory in function of their frequency in the language (e.g., [17]).
Based on Fisher's proposals ( [19]), experimental psychology bases its conclusions on the use of hypothesis testing. Although virtuous, this practice causes several difficulties for studies using reaction times. As early as 1956, [13] drew the attention of researchers to the presence of a large number of very long RTs compared to the number of 810 BAYESIAN ESTIMATION OF THE EX-GAUSSIAN DISTRIBUTION shortest values. The positive skewness of RT distributions is well documented in the literature (e.g., [24]; [10]; [28]; [20]; [16]; [18]; [29]; [21]; [25], [26]). Another well described aspect is the impact of this positive skewness on parameter estimates ( [21]) and thus on hypothesis testing (i.e., t-tests, ANOVA, etc.). This issue is mostly managed in scientific psychology using strategies to normalize the RT distribution, such as data transformation using link functions ( [6]) or as a procedure for right-censoring the distribution (e.g., [22]; [14]) without, however, ignoring all the problems that these strategies can generate (e.g., [9]). Another approach has been to describe the distribution of RTs from the sum of two stochastically independent random variables where one of these two variables is sampled from an exponentially distribution and the second is Gaussian distributed. This exponential modified Gaussian (ex-Gaussian), (see [24]) allows psychologists to work with a theoretical distribution function closer to those of reaction times. Whereas, as an alternative, [32] used the ex-Wald distribution, while [3], [11], [12] and [16] used the Wald, the Weibull and the log normal distributions.
The aim of this work is to estimate the parameters of the exponential modified Gaussian distribution (ex-Gaussian). The experimental conditions of scientific psychology are a constraint for these estimates. To thread the previous example, a researcher may make the same assumption but this time focusing on a population of young children. With this kind of participants, it is difficult to extract more than 40/50 RT. Most of the solutions proposed to estimate the parameters of the exponential modified Gaussian distribution (ex-Gaussian) are based on frequentist approaches from the maximization of the log-likelihood function. However, it is well known that this approach works effectively with n > 100 samples (see [25] and [26]). The aim of this work is to explore the effectiveness of alternative approaches for the estimation of the parameters of the exponential modified Gaussian distribution with n < 100 samples. We suggested a Bayesian parametric approach in which these parameters are considered as random variables. Thus, we choose a prior density for each of the variables and we compute the likelihood of the data. Then, we calculate the conditional posterior of each parameter using the chosen priors and the likelihood. Since the computing posteriors are not of known densities form neither log-concave, we propose to use the adaptive rejection Metropolis sampling method to sample from these conditional posteriors.
To prove the validity of the method and to highlight its main features, we set some applications of the proposed approach by using first simulated data and then real data sets. We compare the root mean square error (RMSE) of the three parameters of the ex-Gaussian obtained by applying the proposed Bayesian method, the quantile maximum likelihood (QML) method proposed by [3], [23] and [2] and the frequentist maximum likelihood estimation (MLE) method proposed by [7] on simulated data. Results shows that the RMSE of the three parameters obtained using the proposed Bayesian method is closer to zero than those obtained by using the quantile maximum likelihood and the maximum likelihood estimation methods which prove the effectiveness of our method.
In this paper, we define the ex-Gaussian distribution in section 2. The section 3, which presents a Bayesian parametric approach to estimate the parameters of the ex-Gaussian, is divided into the two subsections 3.1 and 3.2. In subsection 3.1, since the parameters are considered as random variables in the Bayesian framework, a prior density is chosen for each parameter then, the conditional posteriors of these parameters are computed using these priors and the likelihood. The subsection 3.2 presents the algorithm used to estimate the parameters of the ex-Gaussian based on the rejection Metropolis sampling method. In section 4, we recall the quantile maximum likelihood and the maximum likelihood estimation approaches in order to compare them to the proposed Bayesian approach. In section 5, we provide some applications of the proposed method. Indeed, we set some applications of the proposed method by introducing some simulated data in subsection 5.1, then an application on real dataset in subsection 5.2 to prove its effectiveness and highlight its main features. We compare the results obtained by the proposed method to those obtained by using the quantile maximum likelihood and the maximum likelihood estimation method. The final section is a conclusion of the paper.

Ex-Gaussian distribution
The ex-Gaussian is a well-known distribution in cognitive psychology (see [20]). It is the sum of independent normal and exponential random variables. An ex-Gaussian random variable Z may be expressed as Z = X + Y , where X and Y are independent, X is Gaussian with mean µ and variance σ 2 , and Y is exponential of rate λ.

811
The probability density function of the ex-Gaussian is is the data vector of length n and Φ(z) is the cumulative standard Gaussian distribution, evaluated from −∞ to z, given by and the parameters µ, σ and τ are the mean, the standard deviation of the Gaussian and the scale of the exponential, respectively. Note that σ > 0 and τ > 0.

Bayesian parametric approach
In frequentist statistical inference, the parameters of the model are assumed to be fixed quantities and the data is random. In this case, probability distributions is used to represent the data. However, in Bayesian inference, probability distributions are often used to represent more than just the data. They also represent the uncertainty of the prior in model parameters. Then, these get updated with current data using Bayes' theorem to produce posterior probability distributions.

Bayesian estimation
In Bayesian inference, we start by choosing a prior probability distribution for the parameters of interest θ. To simplify the computations, we use conjugate prior for the model parameters.. This prior quantifies the existing knowledge about θ: where N , IG and G are the Gaussian, Inverse Gamma and Gamma distribution respectively. The prior distribution is then updated by the incoming data (i.e., likelihood) to yield a posterior probability distribution under Bayes's rule: where f (µ), f (ζ) and f (λ) are the prior densities for the parameters µ, ζ and τ respectively and f (x|µ, ζ, λ) is the joint distribution defined in (1) by considering ζ = σ 2 and λ = 1 τ . The marginal likelihood f (x) is the probability of the observed data x and does not involve the parameters of interest. Hence, the equation (2) can be expressed as The symbol ∝ denotes proportionality. The likelihood function is the joint density of the data given the parameters, viewed as a function of the parameters: Now, to find the joint density of x and the parameters, we multiply f (x|µ, ζ, λ) by the prior distributions According to (3), the full conditional posterior distribution is proportional to the joint density (4) Thus, the conditional distribution of µ is given by The equality (6) is obtained by using the equation (5) and by noticing that the denominator f (ζ, λ|x) does not involve the parameter µ. According to (6) and by ignoring all the factor that does not involve µ, we obtain Now, we are interested in computing the conditional posterior of ζ. By proceeding in the same way, f (ζ|x, µ, λ) can be expressed as follows Likewise, the conditional posterior of λ can be expressed as follows Sampling from the conditionals posterior given in (7), (8), and (9) is somewhat complex since they are not functions of known densities. Since the chosen priors do not yield to conditional densities that are always logconcave (Recall that a density is log concave if d 2 dl 2 (log f (l)) ≤ 0 for all l). We suggest to use the adaptive rejection Metropolis sampling (ARMS) technique proposed by [30] which provides a reasonably efficient method for sampling from the conditional posterior distributions. This algorithm is used for sampling from densities that do not have a closed form. Note that this sampling method is available in the package "HI" (see [8]) of the software R under the name arms.
In the following section , we recall the quantile maximum likelihood (QML) estimation method proposed by [3], [23] and [2] to estimate the ex-Gaussian distribution and the frequentist approach based on the maximum likelihood estimation (MLE) (see [7]) in order to compare them later to the proposed Bayesian approach.

Recall on the QML and the MLE approaches
In this section, we recall the MLE and the QML estimation methods. First, the MLE consists in computing the log-likelihood function of the data vector x = (x 1 , . . . , x n ) sampled from an ex-Gaussian distribution then, in maximizing this function to estimate the parameters µ, σ and τ of the ex-Gaussian. Note that the algorithm is available on the package "emg" of the R software under the name emg.mle. Now, we show how the QML method work to estimate these parameters. This method is a variant of the MLE which combine the robustness of quantiles and the efficiency and consistency of MLE. Following [3], for the QML approach, we transform the data vector x = (x 1 , . . . , x n ) sampled from an ex-Gaussian distribution and of length n into a vector of quantile estimates (q) and a vector of counts (N ) of the length of the portions of the data vector x taking place in the inter-quantile range. Using these vectors, we compute the quantile log-likelihood function, then we maximize it to estimate the parameters of the ex-Gaussian distribution. The full algorithm can be found in [3]. Furthermore this algorithm is available in the package "QMLE.zip" (see [27]) of the Matlab software.
In the following, we present numerical experimental application of the Bayesian, the frequentist and the quantile maximum likelihood applications to compare them.

Numerical experimental
This section aims at comparing the proposed Bayesian method to the QML and the MLE methods listed above with application on simulated data first, then on real data set.
We present in Table 1 and Table 2, the root mean square error (RMSE) of the three parameters of the ex-Gaussian for n = {40, 80, 100, 110} and n = {200, 300, 500, 1000} respectively, defined by where the superscript s labels the estimates obtained in simulation s. n = 40 n = 80 n = 100 n = 110  Table 1. Root Mean square error of the three parameters µ, σ and τ for the simulated data of sizes n = 40, 80, 100 and 110 using the QML, the MLE and the Bayesian approaches. n = 200 n = 300 n = 500 n = 1000  Table 2. Root Mean square error of the three parameters µ, σ and τ for the simulated data of sizes n = 200, 300 ,500 and 1000 using the QML, the MLE and the Bayesian approaches. Table 1 and table 2 show that the RMSE of the three estimated parameters of the ex-Gaussian obtained using the proposed Bayesian method is closer to zero than those obtained by using the QML and the MLE methods, when n is large, which prove the effectiveness of the proposed method. Furthermore, these tables show that for n < 100, the QML is more efficient then the MLE approach since for n ∈ {40, 80, 100}, the RMSE is closer to zero, while for n > 100, the RMSE of the ex-Gaussian estimated parameters obtained using the MLE method is closer to zero than the one obtained using the QML method. Thus, The QML is more efficient than the MLE approach for sample size less than or equal to 100.
We provide in table 3 the confidence intervals of the three estimatorsμ,σ andτ .  Table 3. Confidence intervals of the three estimatorsμ,σ andτ for the simulated data of sizes n = 200, 300 and 500 using the Bayesian approach.
We compute theR convergence diagnostic measure ( [1]) for each model parameter. Recall thatR compares the between-chain variability to the within-chain variability. It is available in the package "rstan" of the software R under the name Rhat. To prove that the chains have properly converged,R should be lower than 1.1. For the model parameters µ, ζ and λ,R is equal to 1, 0.99 and 0.98 respectively. Thus, all theR values are lower than 1.1 and the the chains have properly converged. The coverage probability is equal to 0.942 which show the effectiveness of our proposed method. Now, we present in Figure 1, the RMSE of the three estimated parameters of the ex-Gaussian distribution using the proposed Bayesian method. Figure 1 shows that the RMSE of the three estimated parameters approaches zero as the sample size n increases. To check the convergence of the three estimated parameters of the ex-Gaussian toward the true parameters values, we present in Figure 2, the values of these three estimated parameters obtained for different sample size n.  Figure 2 shows that the three estimated parameters µ, σ and τ converge toward the true values of the parameters which are 13, 7 and 9 respectively.

Application on real data
We set here two examples of data, one for n < 100 and the other for n > 100, then we compare the proposed Bayesian approach to the QML and the MLE methods.
For the first example, we have n < 100. More precisely, the number of participants is n = 52. Reaction times were recorded in a letter copy task. One undergraduate student had to copy twice in uppercase the 26 letters of the French alphabet on a graphic tablet (Wacom). Letters were presented randomly. A white sheet on the graphic tablet allowed collecting the handwritten productions to check responses. The participants could not see and monitor their production, as a box hided their hand and the graphic tablet.
By applying the Bayesian, the QML and the MLE methods on the first data example, we obtain the results presented in Table 4.  Table 4. The three estimated parametersμ,σ andτ obtained by applying the QML, the MLE and the Bayesian approaches on the first data example (n = 52).
We compute theR for the three Bayesian model parameters µ, σ and τ . We obtain thatR is equal to 1, 0.99 and 0.99 respectively. Thus, all theR values are lower than 1.1 and the chains have properly converged. Now, we treat the case of n > 100. In this example, reaction times were recorded in a picture naming task. Two hundred and thirty-three participants (n = 233) had to handwrite the label of a specific drawing (e.g., Table) on a graphic tablet (Wacom). A white sheet on the graphic tablet allowed collecting the handwritten productions to check response.
We apply the three methods on this second data example, results are presented in  Table 5. The three estimated parametersμ,σ andτ obtained by applying the QML, the MLE and the Bayesian approaches on the second data example (n = 233). Now, we compute theR for the three Bayesian model parameters µ, σ and τ . We obtain thatR is equal to 1, 1 and 0.99 respectively. Thus, all theR values are lower than 1.1 and the chains have properly converged.

Conclusion
Through its experimental approach ( [19]), scientific psychology supports these conclusions with hypothesis testing. Using reaction times as a measure of behavior causes difficulties in this context because of a positive skewness in the distribution of this measure. It is generally accepted that the distribution of reaction times fits well with an exponential modified Gaussian distribution. (ex-Gaussian), (e.g., [24]; [10]; [28]; [20]; [16]; [18]; [29]; [21]; [25]; [26]). As explained in the Introduction, the sample size per experimental condition is a constraint to the use of the three parameters of ex-Gaussian distributions. Since ML methods require larger samples than those found in experimental psychology, the reliability of estimates is questionable ( [25]; [26]). The constraint of the sample size is a factor on which the psychologist has little impact regarding the characteristics of the populations with which he works (patients, children, etc.) and the time necessary to collect reliable data (loss attention of participants if the experience is too long).
This work aims at proposing a reliable method for estimating the three parameters of the ex-Gaussian distribution for this type of experimental context. We first replicated the results of [3], ( [23]; [4]; [2]) for small samples (n < 100): the QML method should be preferred for parameter estimation. Second, for larger samples (n > 100), our results suggested that estimates are more reliable with the Bayesian approach. This was particularly the case for very large samples (n > 300). Finally, we tested the three approaches on real data (handwritten production of letters (n = 52) and of picture names (n = 233)). The values proposed by the ML and QML approaches are relatively close whatever the size of the samples. The values proposed for the three parameters by the Bayesian approach were quite far from those of the other two methods. The Bayesian method being the most reliable regarding analyzes on the simulated data (section 5.1.), it seems legitimate to conclude that at least for the largest sample, the Bayesian approach provides the estimates closest to the real values, values not-known in an experimental situation.
As an extension of our work, we can applied for extropy in place of informational energy. For more details, see the definition of extropy in [15].