Threshold Stochastic Volatility Models with Heavy Tails: A Bayesian Approach

This paper extends the threshold stochastic volatility (THSV) model specification proposed in So et al. (2002) and Chen et al. (2008) by incorporating thick-tails in the mean equation innovation using the scale mixture of normal distributions (SMN). A Bayesian Markov Chain Monte Carlo algorithm is developed to estimate all the parameters and latent variables. Value-at-Risk (VaR) and Expected Shortfall (ES) forecasting via a computational Bayesian framework are considered. The MCMC-based method exploits a mixture representation of the SMN distributions. The proposed methodology is applied to daily returns of indexes from BM&F BOVESPA (BOVESPA), Buenos Aires Stock Exchange (MERVAL), Mexican Stock Exchange (MXX) and the Standar & Poors 500 (SP500). Bayesian model selection criteria reveals that there is a significant improvement in model fit for the returns of the data considered here, by using the THSV model with slash distribution over the usual normal and Student-t models. Empirical results show that the skewness can improve VaR and ES forecasting in comparison with the normal and Student-t models.


Introduction
A large literature in financial econometrics has documented many stylized facts for financial asset returns. As is well known, the return series exhibits significant non-Gaussian behavior such as heavy tails and asymmetry, in addition to volatility clustering. These properties are crucial not only for describing the return distributions but also for asset allocation, option pricing, forecasting and risk management.
Many empirical studies have shown strong evidence of heavy-tailed conditional mean errors in financial time series; see for example Mandelbrot (1963) and Fama (1965). In the stochastic volatility literature, Liesenfeld and Jung (2000), Chib et al. (2002), Jacquier et al. (2004) and Abanto et al. (2010), among others, have provided consistent evidence that leptokurtic distributions, such as the Student's t, the generalized error distribution or the scale mixture of normal (SMN) distributions, are more adequate to capture this empirical regularity by relaxing the restrictive normality assumption in the distribution of the returns.
Recently, asymmetric behavior in stock returns has received attention since the work of Black (1976), see also Christie (1982) for a discussion. Furthermore, this empirical evidence has provided the explanation that unexpected returns and innovations of the volatility process are negatively correlated, which is known as the leverage effect. Black (1976) asserted that a negative (positive) return shock leads to an increase (decrease) in the company's financial leverage ratio, which has an upward (downward) effect on the volatility of its stock returns. So et al. (2002) introduced an alternative approach to incorporating asymmetry in stochastic volatility models. They developed a threshold stochastic volatility (THSV) model allowing the conditional mean and the log-volatility parameters' to change with the sign of the lagged returns. More recently, Chen et al. (2008) generalize the THSV models allowing the threshold variable to be different from zero and endowing the Student-t distribution to the mean innovation. Note that the SV model (Taylor, 1982(Taylor, , 1994) is a special case of the THSV model when asymmetry does not exist in stock returns.
In this article, we extend the setup of So et al. (2002), Chen et al. (2008) and Abanto et al. (2010), in order to take account simultaneously for heavy-tails of the returns and volatility asymmetries, by considering the THSV model with SMN distributions. We refer to this generalization as THSV-SMN distributions. Interestingly, this rich class contains as proper elements the THSV with normal (THSV-N), Student-t (THSV-T), slash (THSV-S) and the variance gamma (THSV-VG) distributions. The estimation of such intricate models is not straightforward, since volatility appears in both the mean and the variance equation and hence intensive computational methods are needed for estimating purposes. Inference in the THSV-SMN class of models is performed under a Bayesian paradigm via MCMC methods, which permits to obtain the posterior distribution of parameters by simulation starting from reasonable prior assumptions on the parameters. Based on the mixture sampler (Kim et al., 1998;Omori et al., 2007), an efficient multi-move sampler is developed to sample the log-volatilities.
The remainder of this paper is organized as follows. Section 2 shows a brief review of the SMN distributions. Section 3 describes the THSV-SMN class of models as well as the Bayesian estimation procedure using MCMC methods. We discuss some technical details about Bayesian model selection in Section 4. Section 5 is devoted to application and model comparison among particular members of the THSV-SMN models using the BOVESPA, MERVAL, MXX and SP500 returns. Finally, some concluding remarks as well as future developments are deferred to Section 6. Next, in Appendices A and B, we show some derivations for sampling from the full conditionals of parameters and states, respectively.

SMN Distributions
A random variable Y belongs to the SMN family if it can be expressed as where µ is a location parameter, X ∼ N (0, σ 2 ), λ is a positive mixing random variable with cumulative distribution function (cdf ) H(. ν) and probability density function pdf h(. ν), ν is a scalar or parameter vector indexing the distribution of λ and κ(.) is a positive weight function. As in Lange and Sinsheimer (1993) and Choy and Chan (2008), we restrict our attention to the case in that κ(λ) = 1 λ, because it leads to good mathematical properties. Given λ, we have Y λ ∼ N (µ, λ −1 σ 2 ), and the pdf of Y is given by where φ(. µ, σ 2 ) denotes the density of the univariate N (µ, σ 2 ) distribution. From a suitable choice of the mixing density h(. ν), a rich class of continuous symmetric distributions can be described by the density given in (2) that can readily accommodate thicker-tails than the normal process. Note that when λ = 1 (a degenerate random variable), we retrieve the normal distribution. Apart from the normal model, we explore three different types of heavy-tailed densities based on the choice of the mixing density h(. ν).
• The Student-t distribution, Y ∼ T (µ, σ 2 , ν) The use of the Student-t distribution as an alternative robust model to the normal distribution has frequently been suggested in the literature (Little, 1988). For the Student-t distribution with location µ, scale σ and degrees of freedom ν. Y ∼ T (µ, σ 2 , ν) is equivalent to the following hierarchical form: where G(., .) denotes the gamma distribution.
• The Slash distribution, Y ∼ S(µ, σ 2 , ν), ν > 0. This distribution presents heavier tails than those of the normal distribution and it includes the normal case when ν → ∞. The slash distribution is equivalent to the following hierarchical form: where Be(., .) denotes the beta distribution.
• The variance gamma distribution, Y ∼ VG(µ, σ 2 , ν), ν > 0. The symmetric variance gamma (VG) distribution was first proposed by Madan and Seneta (1990) to model share market returns. The VG distribution is controlled by the shape parameter ν > 0 and presents heavier tails than those of the normal distribution. The VG distribution is equivalent to the following hierarchical form: where IG(., .) denotes the inverse gamma distribution.

The Heavy-Tailed Threshold Stochastic Volatility Model
The THSV model with SMN distributions is defined by where y t and h t are respectively the compounded return and the log-volatility at time t. The innovations t and η t are assumed to be mutually independent and normally distributed with zero mean and unit variance. At time t − 1, when there is an unexpected drop in price due to the presence of bad news, y t−1 < 0 and s t = 0. In contrast, if there is good news at time t − 1 then y t−1 ≥ 0 and s t = 1. In the THSV-SMN class the parameters µ, β, α, φ, σ 2 switch between the two regimes corresponding to sign of y t−1 . In this setup, λ t is a scale factor, p(λ t ν) is the mixing density and ν the parameter that capture the heavy-tailness. The aim of the THSV-SMN class of models is to describe both mean and volatility assymetry in the presence of outliers. The THSV-SMN class includes the THSV with normal (THSV-N), Student-t (THSV-T), slash (THSV-S) and variance gamma (THSV-VG) distributions as special cases. The first model is obtained with λ t = 1 for all t, and the other ones are obtained by choosing the mixing density as: λ t ∼ G( ν 2 , ν 2 ), λ t ∼ Be(ν, 1) and λ t ∼ IG( ν 2 , ν 2 ), where G(., .), Be(., .) and IG(., .) denote the gamma, beta and inverse gamma distributions respectively.
Under a Bayesian paradigm, we use MCMC methods to conduct the posterior analysis in the next subsection. Conditionally to λ t , some derivations are common to all members of the THSV-SMN family.

Parameter Estimation via MCMC
Bayesian estimation in the THSV-SMN class of models defined by equations (6a)-(6c) is a non trivial task, because the stochastic volatility h t is an unknow process. To overcome this difficulty, we propose an algorithm based on MCMC simulation to make the Bayesian analysis feasible.
1. Set i = 0 and get starting values for the parameters θ (i) and the latent quantities λ (i) 1∶T , y 1∶T ) 5. Set i = i + 1 and return to 2 until convergence is achieved.
As described by Algorithm 1, the Gibbs sampler requires to sample parameters and latent variables from their full conditionals. Sampling the log-volatilities h 1∶T in Step 4 is the more difficult task due to the nonlinear setup in the mean equation (6a). In order to avoid the higher correlations due to the Markovian structure of the h ′ t s, we apply a multi-move sampler (Kim et al., 1998;Omori et al., 2007) in the next subsection to sample the h 1∶T at once. Multi-move algorithms are computationally efficient and convergence is achieved much faster than using a single move (Abanto et al., 2010(Abanto et al., , 2014. Details on the full conditionals of θ and the latent variable λ 1∶T are given in the Appendix A, some of them are easy to simulate from.
where log 2 t ∼ log χ 2 1 . The model defined by equations (7) and (6b) is a non-Gaussian state space model. So, following the ideas of Kim et al. (1998) and Omori et al. (2007), we approximate log 2 t by a finite mixutre of normals distributions, where, k t is a discrete mixing variable, log 2 t k t = i ∼ N (ϑ i , 2 i ) and q i = P r(k t = i). So equation (7) can be written as where ξ t k t = i ∼ N (ϑ i , 2 i ). Omori et al. (2007) argued that P = 10 gives a satisfactory approximation of the log χ 2 1 density.
For the parameters of the mixture of normal distributions, ϑ i and 2 i for i = 1, . . . , 10, we refer to Omori et al. (2007). The discrete mixing variable k t can be drawn by computing the conditional probability for i = 1, . . . , 10 and t = 1, . . . , T . We use the simulation smoothing method proposed by McCausland et al. (2011) to simulate from the states in the system defined by equations (8) Algorithm 2 describes the simulations scheme using the MMP procedure for the j−th MCMC iteration.

Forecasting Returns, Volatility, Value-at-Risk and Expected Shortfall
The K−step ahead prediction densities can be calculated using the composition method via the following recursive procedure: Numerical evaluation of the last integrals is straightforward. To initialize the recursion, we use h (i) T and θ (i) , for i = 1, . . . , N , from the MCMC output. Given these N draws, we sample h T +k from p(λ T +k θ (i) ), for i = 1, . . . , N and k = 1, . . . , K, by using (6b) and (6c), respectively. Finally, using (6a), we sample y T +k ), for i = 1, . . . , N and k = 1, . . . , K.
To check the performance of the volatility forecasts, we use an extra dataset as validation period to perform m one-step-ahead forecasts. By the moving window approach, we use the first T observations in the initial period to estimate the model and to forecast the (T + 1)−th observation; the sample is then rolled forward by one observation so that the second to the (T + 1)−th observations are used to forecast the (T + 2)−th observation. This process is repeated until the end of the sample, the (T +m)−th observation. To evaluate the performance of the model on VaR prediction, the likelihood ratio test introduced in Kupiec (1995) is used to test that the null hypothesis that the expected proportion of the number of "beyond VaR" or "violation" during the test periods is equal to α. The violation is formulated by I t (α) = I[y T +1 < VaR t (α)] for the left tail and I t (α) = I[y t > VaR t (α)] for the right tail, where I[.] is an indicator function and VaR t (α) is the estimated VaR at level α, which can be obtained by simulation using the k − step ahead densities described below (See Chen et al., 2008;Fan and Wang, 2013, for a detailed review). Let x α be the number of violations, that is, The unconditional test of Kupiec (1995) is a likelihood ratio test with the χ 2 1 -distributed test statistic defined as The for the right tail. Following Aas and Haff (2006) and Nakajima (2013), we compute the measure developed by Embrechts et al. (2004) for evaluating the performance of the predicted ES, denoted by for the left tail and S t (α) = I[δ t (α) > δ α ] for the right tail. Write s α = ∑ T +m t=T +1 S t (α). The measure of Embrechts et al. (2004) is given by D As discussed in Aas and Haff (2006) and Nakajima (2013), D 1 (α) is the standard back-testing measure for expected shortfall estimates. Its weakness is that it depends strongly on the VaR estimates without adequately reflecting the correctness of these values. D 2 (α) is computed to correct it because D 2 (α) measures an average difference between the return and the estimated ES for the α-level tail of that difference from all test periods. A smaller D(α) implies more precise prediction of ES.

Bayesian Model Comparison
To compare the goodness of the estimated models with different members of the SMN distributions, we calculate the Watanabe-Akaike information criterion (WAIC, Watanabe, 2010Watanabe, , 2013. The WAIC is defined as where lppd is the log pointwise predictive density defined by and We can compute both the lppd and the p WAIC using the MCMC output. We label them as θ (i) and h The minimum value of the WAIC gives the best fit.

Empirical Application
We consider the daily closing prices for three Latin American indexes: BOVESPA (Brazil), MERVAL (Argentina) and MXX (Mexico) 1 . We also use the daily returns of S&P 500 stock market index in order to compare the results with Latin American stock markets. The datasets were obtained from the Yahoo finance web site available to download at http://finance.yahoo.
com. The period of analysis is from January 5, 1998, until December 30, 2016. Stock returns are computed as y t = 100 × (log P t − log P t−1 ), where P t is the (adjusted) closing price on day t. The sample size differs between countries due to holidays and stock market non-trading days. Table  1 reports a summary of descriptive statistics for the series of returns.
According with Table 1, MERVAL and SP500 returns are negatively skewed and the BO-VESPA and MXX positively skewed. Regarding kurtosis, all the indexes considered here show kurtosis greater than three, confirming a well-known stylized fact for return series, namely the departure from normality. We analyze the datasets with the aim of providing robust inference by using the SMN class of distributions. In our analysis, we compare the THSV-N, THSV-T, THSV-S and THSV-VG models for each one of the series described above.  In all posterior calculations, we simulate the h ′ t s using the mixture sampler described on Section 3. We set the priors distributions of the common parameters as (100, 100). N 2[.] (., ; ) and IG(., .) denote the truncated bivariate normal and inverse gamma distributions, respectively. For the shape parameters, we assume ν ∼ G(2, 0.10) in the THSV-T (Juárez and Steel, 2004), ν ∼ G(0.08, 0.04) for the THSV-S and THSV-VG respectively.
For all the models and datasets we considered, we generated 60000 MCMC iterations. In all the cases, the first 20000 draws were discarded as a "burn-in" period. In order to reduce the autocorrelation between successive values of the simulated chain, only every 20th values of the chain were stored. With the resulting 4000 values, we calculated the posterior means, the 95 % credibility intervals and the convergence diagnostic (CD) statistics. If the sequence of the recorded MCMC output is stationary, it converges in distribution to the standar normal. According to the CD, the null hypothesis that the sequence of 4000 draws is stationary is accepted at 5 % level, i.e. CD ∈ (−1.96, 1.96), for all the parameters in all the models and series of returns considered here (see Tables 2, 3, 4 and 5 for details.) From Tables 2, 3, 4 and 5, we found for all the models and datasets, that posterior means and 95 % credibility intervals of φ 0 and φ 1 were very close to the unity, which is consistent with the existing evidence of great persistence in the log-volatility process. Additionally, the posterior means of φ 0 and φ 1 under the THSV-N model were slightly smaller than those under the other three models. As expected, the posterior means of σ 2 0 and σ 2 1 under the THSV-N were higher than those under the THSV-T, THSV-S and the THSV-VG models, indicating that the log- volatility process of the last three models is less variable than that of the THSV-N model. We found that the persistence coefficients for the MERVAL index are lower than the coefficients of the other indexes. Under the THSV-T, THSV-S and THSV-VG models, the magnitude of the tail-heavyness is measured by the ν parameter. We found that the posterior means of ν were 24.2300, 1.7587 and 27.9404 (BOVESPA), 10.1110, 1.7690 and 6.5434 (MERVAL), 14.9193, 1.7572 and 11.3888 (MXX), 25.2610, 1.7532 and 9.3880 (SP500), respectively. This results seem to indicate the presence of heavy tails.
To assess the goodness-of-fit of the estimated models, we calculate the WAIC criterion for the SV-N, SV-T, SV-S, SV-VG 2 , THSV-N, THSV-T, THSV-S and THSV-VG models, respectively. From Table 6 the WAIC indicates the THSV-S model gives the best in-sample fit among all the models considered here, for BOVESPA, MERVAL, MXX and SP500 returns, suggesting a sufficient departure from the underlying assumption of normality and the presence of asymmetric behavior in the returns.
In Figure 1, we plot the absolute returns (gray line), the posterior smoothed mean of e h t 2 obtained from the MCMC output for the SV-N model (dotted black line) and the THSV-S model (dotted red line), which are the worst and the best model fit according to the WAIC criterion, for BOVESPA, MERVAL, MXX and SP500 returns. It is important to note that returns from Latin American indexes show a volatile period at the beginning of 1998, which is caused by brazilian crisis. All the indexes showed higher volatility between 2007 and 2008 as a consequence of the subprime crisis. MXX and SP500 showed a similar behaviour. From a practical point of view, we are mainly interested in whether we find a significant difference between the two series. Therefore, in the bottom panel of Figure 2, we plot the smoothed mean of the ratio of e h t 2 obtained from the SV-N and THSV-S models for BOVESPA, MERVAL, MXX and SP500 returns. Some extreme returns make the differences clear. This can have a substantial impact, for instance, in the evaluation of derivative instruments and several strategic or tactical asset allocation topics.
The magnitudes of the mixing parameter λ t are associated with extremeness of the corresponding observations. In the Bayesian paradigm, the posterior mean of the mixing parameter can be used to identify a possible outlier (see Rosa et al., 2003, for instance). The SV and THSV models with Student-t, slash and variance gamma distributions can accommodate an outlier by inflating the variance component for that observation in the conditional distribution with smaller λ t value. This fact is showed in Figure 3 where we plot the posterior mean of the mixing variable λ t for the THSV-S model (the best in-sample fit).
In order to examine the performance of VaR and ES forecast for the competing models, we use the data from January 3, 2017 to Febraury 1, 2019 as validation period, giving m trading days (m = 516, 508, 525, 524 trading days for BOVESPA, MERVAL MXX and SP500 returns, respectively). In the moving window approach, we use the first T observations in the period January 5, 1998 -December 30, 2016 to estimate the model and to forecast the (T + 1)−th observation; the sample is then rolled forward by one observation, so that the second to the  We thus obtain m volatility forecasts, VaR at level 5 %, and ES estimates with confidence levels of 5 % and 95 %. The competing models were: Riskmetrics, SV-N, SV-T, SV-S, SV-VG, THSV-N, THSV-T, THSV-S and THSV-VG.
The results of m one-step-ahead forecasts are presented in Table 7. For the BOVESPA returns, according to the unconditional coverage test, we reject the null hypothesis that the achieved violation rate is equal to 5 % for the Riskmetrics, SV-N and SV-S models at 5 % level. For the MERVAL returns, we reject the null hypothesis at 5 % level for Riskmetrics and SV-N model. For MXX and SP500 returns, we accept the null hypothesis for all models. According to the violation rate the SV-S and THSV-S give better performance than the other competing models, it is the P −values of SV-S and THSV-S models are greater than the other competing models. When, we compare D(0.05) and D(0.95), the THSV-S models show the smaller measures than the competing models for the BOVESPA, MERVAL and MXX returns. For the SP500 returns the SV-S show the smallest for the D(0.05) measure (see the values in bold in Table 7 for details).
In order to evaluate the prediction accuracy of the SV-N, SV-T, SV-S, SV-VG, THSV-N,THSV-T, THSV-S, THSV-VG models, we use the same moving window approach as in the Table 8  MSPE for BOVESPA, MERVAL, MMX and SP500 returns   Series  SV-N  SV-T  SV-S  SV-VG  THSV-N  THSV-T  THSV-S  THSV- where y (i,g) t is obtained by simulation using the MCMC procedure described in Section 3.2 and g ∈ {SV-N, SV-T, SV-S, SV-VG, THSV-N,THSV-T, THSV-S, THSV-VG} denotes the model.
From Table 8, we find that the THSV-S model outperforms the other models using the MSPE. So, according to the MSPE, the THSV-S model gives the best out-of-sample fit for all series of returns analyzed here. It is also important to emphasize that, in general, we do not advocate the use of the THSV-S model in all situations but recommend using the model discussed here to assess the robustness of the conclusions, replacing the normal assumption with a more flexible model if this provides a more appropriate analysis.

Discussion
In this article, we have proposed the threshold stochastic volatility model with scale mixture of normal distributions (THSV-SMN) errors as an alternative to the normal (symmetric) assumption in the conditional distribution of the returns. The THSV-SMN class of models allows a parsimonious yet flexible treatment of both the skewness and the heavyness of the tails of the error distribution. Within the Bayesian framework, we have developed a fast and efficient MCMC sampling procedure to estimate all the parameters and latent quantities in our proposed THSV-SMN class of models. As a by-product of the MCMC algorithm, we were able to produce an estimate of the latent information process which can be used in financial modeling. The use of mixing variables, λ 1∶T not only simplifies the full conditional distributions required for the Gibbs sampling algorithm, but also provides a mean for the outlier diagnostics. We applied our methods to the analysis of the BOVESPA, MERVAL, MXX and SP500 returns, which showed that the THSV-S model provides a better fit than the SV-N, SV-T, SV-S, SV-VG, THSV-N, THSV-T and THSV-VG terms of parameter estimates, interpretation and robustness aspects. We found that the THSV-S model outperforms the other models using the MSPE given the best out-of-sample fit and it can be used to VaR and ES forecast.
A potential interesting future research topic is the further investigation of the large observations by introducing jump components or correlation between perturbations of returns and volatilities.