A different approach for choosing a threshold in peaks over threshold

Abstract In Extreme Value methodology the choice of threshold plays an important role in efficient modelling of observations exceeding the threshold. The threshold must be chosen high enough to ensure an unbiased extreme value index but choosing the threshold too high results in uncontrolled variances. This paper investigates a generalized model that can assist in the choice of optimal threshold values in the \gamma positive domain. A Bayesian approach is considered by deriving a posterior distribution for the unknown generalized parameter. Using the properties of the posterior distribution allows for a method to choose an optimal threshold without visual inspection.


Introduction
In Extreme Value methodology we assume that a random sample X 1 , ..., X n from an unknown distribution F belongs to the domain of attraction of a generalized extreme value distribution. Let X n,n = max (X i 's) and assume there exists sequences a n > 0 and b n such that as n → ∞, P X n,n − b n a n → G γ (x) , where γ is known as the extreme value index (EVI), Beirlant et al. (2004). All extreme value distributions, G γ (x) = exp − (1 + γx) − 1 /γ for 1 + γx > 0, x ∈ R, can occur as limits in (1). It is well known that P (X − u > y |X > u) = F (u + y) F (u) → H γ (y) = (1 + γy) − 1 /γ , x > u, γ ∈ R.
(2) See for example Beirlant et al. (2004) and de Haan & Ferreira (2005) for further details. From (2) it states that the excesses of the conditional distribution of X − u, given X > u can be modelled through a peaks over threshold (POT) model, the generalized Pareto distribution (GPD) given in (2) as H γ . Very often in literature the focus is only on the Pareto type case where γ in (1) is positive. The survival function of the Pareto type distribution is given as where l (x) is known as the slowly varying function that satisfies l(xu) l(u) → 1 as u → ∞ (Beirlant et al., 2004), and 1 /γ is the EVI. Pareto-type distributions follows a simpler POT limit as u → ∞: For the remainder of the paper (4) will be referred to as the survival function of the Strict Pareto distribution.
The limit results in (2) and (4) entails that the threshold u should be chosen as large as possible such that the bias of γ is not too large. In contrast to that, the threshold u should be chosen as small as possible such that the estimation variation is limited. This is a complicated trade off, which makes the threshold choice difficult in practice. Often the threshold is chosen as one of the high data points in an ordered sample X 1,n ≤ X 2,n ≤ · · · ≤ X n,n .
Several procedures have been proposed for reducing the bias and limiting the variance, where a second order (sometimes third order) of (4) is assumed, ssee for example Feuerverger & Hall (1999), Gomes et al. (2000), Beirlant et al. (1999Beirlant et al. ( , 2009Beirlant et al. ( , 2019 and Caeiro & Gomes (2011) to name a few. This paper slightly touches on the bias reduction and variance stability of the EVI estimate but is not an extensive study on this topic. The paper's focus is on the investigation of a practical and rather simple method to choose an optimal threshold value if the data follows a POT distribution (with γ > 0). Often in practice the EVI in (4) is estimated, through various methods, such as maximum likelihood (ML) or Method of moments at different threshold values. These estimates are then plotted against the various threshold values, and the graph is visually inspected to find a threshold range where the EVI estimate seems to be stable. Although it is a popular method in practice it is not always a clear picture with a lot of volatility.
It is sometimes difficult to see a stable area. The paper is arranged as follows: In Section 2 a Topp-Leone Pareto distribution is considered where a generalization parameter is introduced to generalize the strict Pareto distribution. This additional parameter gives valuable insight to the choice of threshold. In Section (3), a Bayesian, rather than a classical approach is considered in the parameter estimation and modelling process. A small simulation study is conducted in Section 4 to investigate the effectiveness of the generalized model. The generalized model is applied to a real data set in Section 5. Section 6 explains the method we propose for choosing an optimum threshold.

Topp-Leone Pareto distribution
The Topp & Leone (1955) distribution was generalized by Rezaei et al. (2017) where 0 ≤ x ≤ 1 is replaced by die CDF of any baseline distribution. The CDF and probability density of the Topp-Leone generated family of distributions are given by and where g (x; θ θ θ ) and G (x; θ θ θ ) denote the probability density and CDF of the baseline distribution with parameter set θ θ θ , Razaei et al. (2017). In this study we consider the baseline distribution as the Strict Pareto distribution with probability density and CDF given respectively by where 1 /γ denotes the EVI . Substituting (7) and (8) into (5) and (6) yields the Topp-Leone Pareto (TLPa) distribution with PDF and CDF given respectively as and where y > 1, γ > 0 and α > 0.
The survival function of the TLPa can be expanded through a Binomial expansion such that is the slowly varying function and 1 /2γ is the EVI. If α = 1 (10) and (9) become the Strict Pareto probability and CDF with an EVI of 1 /2γ. In this study we are focusing our analysis on the assumption that as n → ∞ the relative excesses ( X /u) will follow a Strict Pareto distribution (this assumption is well-known in literature) or equivalently a TLPa with α = 1.
It can easily be seen that when a Jeffreys prior, p (γ) ∝ 1 /γ, is assumed for γ in the Strict Pareto case, the posterior distribution, given the data and the threshold, is given as: From (12) it is clear that the posterior is a gamma distribution with parameters n and ∑ n i=1 logy i , where n denotes the number of observations above the threshold. An estimate of γ can be the mean of the above gamma distribution. The EVI of the Strict Pareto will then follow an inverse gamma distribution, EVI SP ∼ InvGamma n, (∑ n i=1 logy i ) −1 , where SP represents the Strict Pareto case and the second parameter is the rate parameter.
In the case of the TLPa the joint posterior is derived by assuming an independent Jeffreys prior, p (γ, α) ∝ 1 /γα and the likelihood is given as The joint posterior will be The conditional posteriors can be approximated as and It now follows that the EVI of the TLPa(conditional on α and y y y) is inverse gamma, EVI TLPa |α, y y y ∼ InvGamma nα, (∑ n i=1 log (y i )) −1 , where the second parameter is the rate parameter. The derivation of (15) is trivial. The approximation of (16) is given in the Appendix. Equations (12), (13) and (14) will be used in the simulation study of Section 4.

Simulation study
In this simulation study observations are simulated from heavy tailed distributions such as the Fréchet and the Burr distributions where γ > 0. At each threshold level (from the smallest sorted observation to the second largest sorted observation) the parameters of the two models (SP and TLPa) are estimated.
In the case of the SP,γ is estimated as the mean of the gamma distribution in (12). A Gibbs sampler is considered for the TLPa. A starting value for γ is chosen. For the chosen γ, anα * is drawn from (15).
Givenα * , a newγ * is drawn from the joint posterior given in (14)   It can be seen from the above cases that the EVI estimates of the TLPa is more stable, and less sensitive to the threshold choice, than the EVI estimate of the SP. This is expected since the variance of γ TLPa |α, y , decreases as the value of α increases, see (12) and (16).

Real data example
In this section the wave height data from Newlyn, Cornwall, is considered as a real data example. A POT approach is used to model this data set and according the the mean residual plot a marginal threshold is chosen at 6.1 meters, see Coles (2001). The data observations with n = 2894 is given in Figure 4.
Extreme observations can easily be observed from the figure. Figure 5 shows the EVI estimates of the two models (left) and the estimate of α (right) for the TLPa model. Figure 5 shows that α moves towards 1 around the 2800 th sorted observation. This corresponds to a wave height of 6.67 meters which is close to the 6.1 meters threshold chosen by Coles (2001). The data set can be obtained from the ismev package in R. The threshold was chosen as the 2800 th sorted observation and the quantiles (above the chosen threshold) for these two models were calculated by assuming the estimated parameter values at this threshold.
As shown in Figure 6 the log quantile-quantile plot of both distributions indicates a good fit. This is expected since the threshold of 2800 seems to be optimal from Figure 5.

Procedure to choose a threshold
Since the main focus of this paper is on threshold choice and not bias reduction this section introduces a practical and modest method to choose a threshold. The conditional posterior of α from (15) is used.
The advantage of this method is that the threshold can be chosen without visually inspecting a graph.
and using the theory that as n → ∞, α → 1, the method proposes to find the combination of γ # and u # (from a large spectrum of γ and u values) for which the Bayes estimate under squared error loss, [E (α |γ, y y y) − 1] 2 , is a minimum. Considering the above method two data sets are simulated for illustration purposes i. n = 300 observations from Fréchet (γ = 2).
For data set (i) the EVI TLPa = 1 /(2γ # ) is chosen as 0.5383 and u # is chosen as the 174 th sorted obser-vation. For data set (ii) the EVI TLPa = 1 /(2γ # ) is chosen as 1.1230 and u # is chosen as the 196 th sorted observation. For the real data example the EVI TLPa = 1 /(2γ # ) is chosen as 0.1158 and u # is chosen as the 2850 th sorted observation (u = 7.52). These chosen thresholds, together with the EVI estimates seem to be in line with the results obtained from the simulation studies (Figures 1 and 3) and the wave height application ( Figure 5). The histogram with the chosen threshold (for the wave height date) is shown in Figure 7. Another two data sets are simulated to test the appropriateness of the above method.
iii. n 1 = 500 observations from Normal(µ = 5, σ 2 = 1) and n 2 = 100 observations from SP(γ = 5) and u is chosen as the maximum Normal observation. The two simulated data sets are joined together as a single data set. The SP threshold is thus the 500 th sorted observation.
iv. n 1 = 500 observations from Normal(µ = 10, σ 2 = 16) and n 2 = 100 observations from SP(γ = 2) and u is chosen again as the maximum Normal observation. The two simulated data sets are joined together as a single data set. Again the SP threshold is the 500 th sorted observation. Figure 8 shows the histogram of data set (iii) where the threshold is indicated with a vertical line.
From Figure 8 it seems quite easy to visually distinguish between the two distributions. The threshold method is applied to the simulated data set where the threshold was chosen from the 50 percentile onward.
An EVI-threshold pair that minimizes the Bayes estimate under squared error loss is selected. The process is repeated 1000 times. Table 1 shows the mean of the 1000 chosen thresholds as well as the mean of the EVI TLPa estimates. The chosen threshold seems to be rather close to the actual threshold and the EVI estimate is close to the true EVI, 1 /γ = 0.2. Figure 9 shows the histogram of data set (iv) where the threshold is indicated with a vertical line.
From Figure 9 it is not visually possible to distinguish between the two distributions, giving the smooth transition from the Normal distribution to the SP distribution. Table 2 shows the mean of the 1000 chosen thresholds as well as the mean of the EVI TLPa estimates. Since the transition between the two distribution are rather smooth the chosen threshold is not as close to the actual threshold but still reasonable and the EVI TLPa estimate is still rather close to the true EVI, 1 /γ = 0.5.  Threshold estimate EV I estimate 457.339 0.2337 Figure 9: Histogram of data set (iv) with the chosen threshold indicated with the vertical line.

Conclusion
In this study we have shown that the TLPa is a suitable model to use when modelling extreme events above a threshold when γ > 0. The TLPa has proven to be less sensitive to the correct threshold choice.
The focus of the study was to show that the α parameter of the TLPa can successfully contribute to choosing an appropriate threshold without using any visual technique.
A Bayesian approach was considered and the conditional posterior distributions of the two parameters of the TLPa was derived. These posteriors were valuable in the simulation studies. A method for choosing an optimum threshold was introduced. This method involves finding a combination of EVI TLPa and u (threshold) values that minimizes the Bayes estimate under squared error loss. The illustrations in Section 6 shows that the method is easy to use with sufficient results.