Searching for the Best Inﬂation Forecasters within an Employment Survey: Microdata Evidence from Chile

This article aims to evaluate quantitative inﬂation forecasts for the Chilean economy, taking advantage of a speciﬁc survey of consumer perceptions at the individual microdata level, which, at the same time, is linked to a survey of employment in Chile’s capital city. Thus, it is possible to link, with no error, consumer perceptions and 12-month-ahead inﬂation forecasts with personal characteristics such as gender, age, educational level, county of living, and the economic sector in which they are currently working. By using a sample ranging from 2005.II to 2018.IV, the results suggest that women aged between 35 and 65 years old, with a college degree, living in the North-eastern part of Santiago (the richest of the city), and working in the Community and Social Services sector are the best forecasters. Men aged between 35 and 65 years old, with a college degree, in a tie living in the South-eastern and North-eastern part of the city but working in Retail and Government and Financial Services sectors, respectively, come in second. Some econometric exercises reinforce and give greater support to the group of most accurate forecasters and reveal that another group of forecasters, diﬀerent from the second-best in terms of forecast accuracy, displays the characteristics required of a forecasting variable. Remarkably, this group has the same speciﬁcations as the most accurate group, with the only diﬀerence being that it is composed of men instead of women. Thus, it looks promising for further consideration. Importantly, a forecast accuracy test reveals that no factor comes out as superior to the naïve random walk forecast used as a benchmark. These results are important because they help to identify the most accurate group when forecasting inﬂation and, thus, help reﬁne the information provided by the survey for inﬂation forecasting purposes. level of actual GDP and potential GDP. The latter is deﬁned as the logarithmic level of the GDP’s seasonally adjusted and ﬁltered version including up to ﬁve years of forecast observations coming from an ad-hoc autoregressive integrated moving average (ARIMA) model. This last step is performed to avoid the “end-of-sample” identiﬁcation problem when using the Hodrick-Prescott ( λ = 1,600) method to ﬁlter the series. The seasonal adjustment program used is the X-13ARIMA-SEATS, whereas the ARIMA forecasting model is the so-called airline model (Box and 1970; Ghysels et al., 2006).

of the respondents, which heavily relies on personal (yet anonymous) characteristics. 4 This is done by constructing sub-sets of inflation expectations factors with fully identifiable and mutually excluded characteristics. Mentioned characteristics are gender, age, education, county of living, and the economic sector of present work, representing the exploitable personal attributes of the database. There are some other interesting attributes available, but with very small sample size, not suitable to conduct a reliable statistical inference.
The question of which group of consumers is better at inflation forecasting is not new. 5 However, there is scarce literature for emerging economies, particularly in the case of Chile. In this sense, this study constitutes the first analysis of Chilean inflation forecasts considering the respondents' personal, demographic, and geographic characteristics in an out-of-sample evaluation. However, few studies analyse Chilean inflation expectations inferred from both market-based financial securities and surveys, especially aiming to analyse the degree of expectations anchoring. Despite using a very short sample span (2002)(2003)(2004)(2005) from a then non-developed Chilean financial market, Gürkaynak et al. (2007) provide supporting evidence that inflation breakeven obtained from financial assets remain anchored to the recently implemented inflation targeting regime. Larraín (2007), in turn, finds that it is not a clear cut how well inflation breakeven comes out as a good inflation expectations measure using the same sample span. Using two financial models, the author finds that mentioned market-based measure includes other components, such as risk and term premia, that could move in opposite directions to inflation expectations. However, with a larger sample span from a more developed Chilean financial market, De Pooter et al. (2014) use survey-, market-, and inflation-linked bonds to analyse the anchoring degree in Brazil, Chile, and Mexico. The authors find that inflation expectations have become much better anchored from 2004 onwards in the three economies. Particularly in the case of Chile, inflation expectations come out as less sensitive to economic news in China and the United States, which is understood as a strong anchoring degree. On a similar verge, Medel (2018) finds that inflation expectations from the Economic Expectations Survey conducted by the Central Bank of Chile display a low sensitivity to actual inflation readings based on a battery of econometric estimates. Using a dynamic stochastic general equilibrium (DSGE) model allowing for a timevarying learning mechanism of accumulated inflation forecast errors, Arias and Kirchner (2019) find that long-term expectations are insensitive to the arrival of new information in an episodic answer). Despite that, the question is asked freely (and this is how they are found in the original Stata files of the database), in the sense that respondents are not forced to frame their answers in a probability distribution of possible outcomes, this is how the results are presented in the Survey of Perception and Expectations on the Economic Situation in Greater Santiago report in tranches of (-,2%), (2%,3%), (3%,4%), and (4%,+). 4 Naturally, more surveys are asking for inflation expectations at different horizons to different agents in Chile.
manner. This is the probability that inflation expectations may become unanchored varies with accumulated inflation surprises, which is consistent with the learning mechanism in the proposed DSGE model. Finally, a recent review of Chilean inflation dynamics, including its expectations anchoring degree, is presented in the Central Bank of Chile (2020), especially considering the benefits provided by the inflation targeting regime adopted in 2000 in the form of a fixed target of 3% over a 2-year horizon.
By using a quarterly sample ranging from 2005.II to 2018.IV (55 observations), and different combinations of attributes leading to 648 different inflation forecasts (or "factors"), the results for total inflation suggest that women aged between 35 and 65 years old, with a college degree, living in the North-eastern part of the city (that with the highest living standards of the country), and working in Community and Social Services sector are the best forecasters of total inflation (i.e., the "winner factor"). Out of 3,060 consumers surveyed each quarter, this group comprises up to 26 consumers, i.e., the top 1% of forecasters. Men aged between 35 and 65 years old, with a college degree, in a tie living in the South-eastern and North-eastern part of the city but working in Retail and Government and Financial Services sectors, respectively, are the second-best at forecasting inflation. Finally, men aged between 35 and 65 years old, with a college degree, living in the North-western part of the city, working on Government and Financial Services are in the fourth place. From the fifth to eighth places, I found men aged between 35 and 65 years old as a common characteristic and concentrated in Community and Social Services and Government and Financial Services, but with a different spatial distribution. Notice that all these results are compared to the naïve random walk (RW) forecast, as it assumes no deeper knowledge on inflation dynamics, and thus, it serves as a predictive benchmark such as the cases of Kenny et al. (2014) using the University of Michigan's Survey of Consumers for the United States, Goyal and Parab (2019b) using the Consumer Confidence Survey for India, and Drakos et al. (2020) for 17 European countries using Eurobarometer 's inflation forecasts. 6 Just the mentioned eight out of 648 total possible forecasts outperform the benchmark, with none of them coming out as statistically superior to the RW. 6 Other reasons why the RW would be adequate as a benchmark. First, despite that inflation is stationary, its forecasts could be more accurate by using the RW model instead of, for instance, forecasts based on an estimated p-order autoregressive AR(p) model. This is because a highly persistent time series-such as the case of inflationspoils out the estimation of near-unity coefficients (easily delivering suboptimal forecasts). Thus, the estimation bias is higher enough to more-than-offset model uncertainty with a finite sample. A misspecified model (such as the RW) performs systematically better than an estimated AR(p). This is the key point of Medel and Pincheira (2016) and Pincheira and Medel (2016). Second, the RW is often referred to as "the naïve forecast" in the forecasting literature precisely because it is the type of forecast that does not require any knowledge of the underlying data generating process (or neither of econometric forecasting nor the current state of the macroeconomy). Instead, it is required (and sufficient to this end) to know the latest available observation only. This is a reasonable criterion for discriminating between informed and non-informed respondents about the macroeconomy without necessarily being experts in inflation forecasting. Third, the RW forecast is the same for all horizons, which is presumably the case for a non-expert sample of individuals that associate "12-month ahead" with a blurry "future horizon", as mentioned in Chanut et al. (2019). Thus, this sort of "rational inattention" of the forecasting horizon is more likely to occur with a non-expert group of individuals; presumably, the case with the used dataset, or at least, is an adequate assumption when they asked to respond to a "consumer". Finally, it is unclear what atheoretical model could be better used as benchmark than the RW, given the unknown knowledge on inflation by the respondents.
Also, several econometric exercises are conducted to further discriminate between the inflation factors that prove to be more accurate to the RW. These exercises (i.e., a brief comparison of key descriptive statistics, a regression-based biasedness analysis, forward-versus backwardlookingness estimates, a hybrid New Keynesian Phillips curve forecast comparison, and the U-Theil decomposition) reinforce and give more support to the winner inflation factor. However, these exercises reveal that a factor different from the second-best in terms of forecast accuracy displays the characteristics required of a forecasting variable. Remarkably, this mentioned factor has the same specifications as the winner factor, with the only difference being that it is composed of men instead of women. The results presented in this article are important because they help identify the most accurate group when forecasting inflation and, thus, help refine the information provided by the survey for inflation forecasting purposes.
The rest of the article describes the econometric setup, compounded by the dataset and the forecast evaluation framework. In Section 3, I present the results plus some econometric estimations to characterise and emphasise the differences between inflation forecasts, whereas in Section 4, I provide a discussion on three topics of interest in the light of results: (i) the extent to which the characteristics of the best forecasters are similar to that suggested by the international evidence, (ii) a discussion on why the mean is the chosen statistic instead of the median to represent individuals with the same characteristics, and (iii) an exploratory analysis on what type of price series the group with the less accurate results for the total inflation is targeting at. Finally, I conclude in Section 5.

Data
As mentioned above, I make use of the microdata freely available (after submitting an online registry) by the Centro de Microdatos of Universidad de Chile (http://documentos.microdatos. cl/). This database is the result of merging two surveys: the Survey of Employment and Unemployment in Greater Santiago (the EOD) and the Survey of Perception and Expectations on the Economic Situation in Greater Santiago (the IEE). A unique feature is that both databases are available (anonymously) at the individual level and are already merged. Respondents are asked about their labour situation as workers and their economic perceptions as consumers. The independence in answering both surveys, especially that of sentiment, is ensured by the wording of the questions. Thus, the sentiment is not conditioned to the labour situation by survey design.
Notice that the inflation expectation question is embedded in an employment survey, which typically applies the same questionnaire with a fixed frequency to a previously defined sampling scheme composed of individuals. On some occasions, like the one at hand, the unit of analysis are households and, therefore, representative expansion factors to match the total population is used. In the case of this survey, individuals are asked if they are responsible for a household-typically, the main earners-( "Jefa/e de Hogar"). If so, an expansion factor is applied according to their representativeness in the population. These expansion factors are representative according to the 2002 Census, as explained in Centro de Microdatos (2016). The corresponding next Census to update the sample should be that of 2012. However, the reliability of its results-due to relevant methodological flaws-and a subsequent summarised version conducted in 2017 does not provide the necessary and reliable information to update the survey's sampling scheme. Thus, I prefer to use the "whole sample" (i.e., that using expansion factors) as the data would be more likely used, as an employment survey, and as a lack of a better Census alternative. As an employment survey, it is more likely to be read (and used) measured in "number of individuals" (i.e., the labour force) to then calculate the different rates and flows of the labour market (e.g., unemployment rate).
Naturally, the universe of the IEE is the same as that of EOD: 7 inhabitants over 14 years old living in the Santiago Metropolitan Region and in Puente Alto and San Bernardo counties. This adds up to about 40% of the Chilean population in 2017. The sample comprises 3,060 individuals per quarter, consisting of stratified random sampling with a panel data component: a rotating panel. This is a method where part of the panel is kept permanently, and another part is an entirely new cross-section sample. The sample of 3,060 individuals in each quarter is divided into four subsamples of 765 individuals, where each subsample is independent and represents the Greater Santiago. The rotation design is of the 2-2-2 type, where two selected individuals are interviewed twice in a row. They are not contacted in the following two rounds, and then they are interviewed again in the following two rounds, covering a period of 18 months in total. However, the unique identificatory is not available to the public, and thus, it is only possible to conduct panel estimations considering a group of individuals with a common characteristic. The collection technique is face-to-face interviews, and the reported answer rate reaches 77.4% (informed in March 2014). The representativeness with respect to the universe could be considered adequate. It fulfils all sampling requirements, but this is not the case when the whole country is considered, as it focuses on Santiago only. 8 The IEE has been released quarterly and fully available since March 2001 (75 observations available until September 2019).
The merged database is compounded with a total of 142 variables. Out of this total, 18 are associated with the IEE, and 46 with the EOD. The remaining variables are answers on the household's income and debt-related issues and working variables for internal use. However, financial variables are available in a shorter sample span, making it difficult to use them for this article. Moreover, not all series of the EOD are possible to use for this same reason or a very low answer rate. This is the case of income and the time of the respondent working in the same job. Particularly, these two questions would be useful to discriminate between groups (income as so, and the second as a proxy experience), and thus, other variables must fulfil this task.
The actual total inflation measure is presented in Figure 1a. Given that inflation rates are presented in monthly frequency, two versions of the quarterly series are analysed: the end-ofperiod rate (comparing the annual variation of the last month of each quarter with the same month of the previous year), and the average rate (comparing the annual variation of the 3-month average of the quarter with the same average of the previous year). The variables used to classify and compound the inflation factors are gender, age, education, county of living, and economic sector of current work. Notice that all these attributes are mutually exclusive. I consider all individuals that respond "Working" to the "Employment Situation" question because it leads to    a more meaningful result from where it is possible to draw some economics-based conclusions. 9 However, given the extensive answer classification in some of these questions, some factors deliver empty entries in quarters where no individuals fulfil the classification. To avoid this problem, I re-code some variables according to the key of Table 1.
The detailed definition of each economic sector is found in Centro de Microdatos (2009). The main differences between the two more apparently similar economic sectors are worth mentioning: Personal and Households Services and Community and Social Services. While repair services compound the former, laundries and laundry services, cleaning and dyeing establishments, domestic services, and personal and miscellaneous services (e.g., hairdressers and beauty salons, photo studios), the latter includes sanitation, educational, and health services, social welfare institutions, entertainment and leisure services, and other communal and social services (e.g., retail, professional, and labour associations; religious organisations).
As mentioned above, I consider all "Working" answers to the "Employment situation" question, corresponding to 59.1% of total answers on average, 2005-18 (see Figure 1b). The second and third place in terms of several answers are "Housework" and "Others", from which are more difficult to delve further into their characteristics leading to more accurate forecasts as, for instance, considering the economic sector in which they worked (if applicable). However, more prohibitive is the sample availability to the inflation expectation question, impeding arriving at a reliable factor to be compared to those under the "Working" category.
In terms of gender and age, Figure 1c, suggests that the (34,55) years old (re-coded) tranche is the biggest for both men and women, whereas the survey is biased towards women in all the original tranches. This fact could be potentially problematic, but all other attributes could partially correct this bias by refining the groups with a sample closer to the population.
Regarding education, Figure 1d, display the distribution of the three tranches of education, re-coded to strengthen inflation factors with a more aggregate definition of "education". Notice that they are displayed using the original eight zones in which the IEE is compiled, showing a greater concentration of a higher college degree level in the North-eastern part of Santiago and lower levels of education living in the Western part of the city. Aiming to strengthen factors with numerous observations, the eight zones of Santiago defined and surveyed by the IEE are re-coded following a cardinal representation: South-western, South-eastern, North-eastern, and North-western. According to Figure 1e, cardinal zones are balanced in terms of total answers, except for the North-eastern zone. However, the correct representation is ensured through the different attributes shaping inflation factors, and then all of them are evaluated in the same manner. Finally, the sample distribution across the economic sectors is presented in Figure 1f. There is no re-coding for this variable and thus, Retail (2005-18 average: 21.1%), Community and Social Services (18.9%), Government and Financial Services (18.5%), Manufacturing (13.8%), and Personal and Household Services (10.1%) are the biggest sectors with double-digit weight in the total sample of the IEE.
Despite the re-coding of some variables to strengthen the factors, it is still possible to have some of them with no entry for some quarters. Therefore, I use a criterion of each of the 648 factors disposing of at least 95% of the total possible observations (50 out of 55 observations). This criterion leaves out 55 of the 648 total factors that are listed in Figure B.1 in Appendix B. Also, given the reduced number of factors, it is possible to proceed with a brute-force exercise testing all factors instead of an algorithm-based search for the attributes leading to the best out-of-sample results.

Forecast Evaluation Framework
The forecast evaluation statistic used is the root mean squared forecast error (RMSFE), defined as: where π τ +4 τ represents the transformed 4-quarter-ahead forecast of π τ +4 made with information known until time t. I dispose of a total of P = 55 forecasts, ranging from 2005.II to 2018.IV in quarterly frequency. For simplicity, the results are reported using a relative measure of the RMSFE, easing a comparison across the alternative forecasts: where "Factor" stands for each of the 55 available factors; thus, values below one imply a better performance in favour of the consumer-based factor.
To investigate to what extent the predictive gains are statistically significant, I make use of the unconditional one-sided t-type Giacomini and White (2006;GW) test providing the advantage of comparing forecasting methods instead of forecasting models. As the null hypothesis (NH) is defined as the competing forecast has a superior predictive ability compared to the RW, a one-sided t-type GW statistic is used accordingly.
Formally, I test the NH: using the Newey and West (1987) heteroskedasticity and autocorrelation corrected estimator of the standard deviation ofd τ . The NH is rejected if the subsequent t-statistic is greater than t α% , corresponding to the tabulated value of a normal distribution with probability α%.

RMSFE Results
The RMSFE ratio results are presented in Table 2 for both the end-of-period and the average high-to-low frequency transformation. As mentioned above, the figures below one favour the consumer-based inflation factor. Factors are coded with the mask "[Gender]-[Age group]-[Education]-[Cardinal region]-[Economic sector]" and shown in lexicographic order. Also, those coming out better than the RW are labelled as Fj, where j is the ranking according to the RMSFE results as shown in Table 3. A further visual comparison can be found in Appendix C.  A remarkable result is that eight factors outperform the RW with both end-of-period and average versions, even in a similar (but not equal) ranking. In both cases, the winning factor comprises women aged between 35 and 65 years old, with a college degree, living in the Northeastern part of the city, and working in the Community and Social Services sector. This group is composed of up to 26 consumers, that is, the top 1% of forecasters. For an overview of its performance, Figure 2 displays the scatter plot of the actual inflation series and the F1 winner factor. It is important to note that all points far from the 45 ○ line are those from the half of Table 3 Inflation factors with a RMSFE smaller than the RW. 2007 to the half of 2009 when inflation suffers the impact of greater volatility due to the 2008-09 Global Financial Crisis. Despite this episode, the factor tends to show a concentration of accuracy, particularly when inflation is close to the target. In the second place, and still for the end-of-sample transformation, comes the factor composed by men aged between 35 and 65 years old, with a college degree, in a tie living in the South-eastern and North-eastern part of the city but working in Retail and Government and Financial Services sectors, respectively.
Then, in the fourth place are men aged between 35 and 65 years old, with a college degree, living in the North-western part of the city, working on Government and Financial Services. From the fifth to eighth places, I found men aged between 35 and 65 years old as a common characteristic and concentrated in Community and Social Services and Government and Financial Services,  but with an uneven spatial distribution. No other than that, the mentioned eight factors come out as superior to the RW. The same ranking is observed for the average transformation except for a swap between the sixth and the seventh place.
According to the Giacomini-White test, a major drawback of these results is that none of the mentioned eight factors prove statistically superior to the RW. Notice that among the common characteristics of the eight best factors is that all of them are composed by consumers aged [35,65] and with a college degree. They are also composed of men, except for the winner factor. Regarding the economic sectors, despite being the option with more alternatives available, the best results are concentrated in just two sectors: Government and Financial Sector and Community and Social Services. Finally, there is no clear pattern regarding the spatial location.

Complementary Econometric Results
After finding the most promising factors, that is, those exhibiting a better forecasting performance than the RW (Table 3), I conduct econometric exercises to further discriminate between them. The first exercise is the simplest one and compares practical-use descriptive statistics. The results are presented in Table 4. Factors closest to the total end-of-period mean are F3, F5, and F6, whereas for the median is F5, F3, and F1. When comparing the standard deviations, F3 and F5 again come out with satisfactory results, replicated to the percentage of times within the (2,4) interval. In this sense, F3 and F5 display similar in-sample diagnostic as total endof-period inflation, but it is relevant to delve into their predictive features with the remaining exercises. The second exercise compares the biases of the forecasting series. Expectations are unbiased (i.e., they show no room for improvement) if they are on average equal to actual inflation. This translates into the regression: where π τ +4 τ −4 is the 12-month-ahead forecast (but transformed to quarters), lagged in four quarters to be comparable to actual inflation π τ τ , α and β are parameters to be estimated, and ε τ is an error term that could be autocorrelated, in whose case a 3-term moving average is used.  Table 2 sorted according to its RMSFE ratio as in Table 3. Bold estimates show statistically significant coefficients at 10% level of confidence. p-value shown in parentheses. "Adj. R-sq." stands for adjusted goodness-of-fit coefficient. "F -test (α = 0, β = 1)" stands for the F -statistic of the null hypothesis α = 0 and β = 1. The frequency transformation implies replacing the t + 12 subscript for τ + 4. So, the lack of bias simultaneously means that α = 0 and β = 1, which is tested through a Wald-type F -test. If the null hypothesis NH: α = 0, β = 1 is rejected, then the forecast is biased, and there is room for improvements; otherwise, the prediction is said to be efficient in using the information available up to the release of the latest observation.
The results are presented in Table 5. The results suggest that the null hypothesis of unbiasedness is rejected for all factors and thus, there is room for improvement in all of them. However, the results of factors F1 and F2 suggest that they are more efficient in using information because only the statistically significant coefficient is unrelated to actual inflation. Thus, for instance, a less-invasive intercept-correction method could deliver better results than the remaining factors, which requires more information to achieve efficiency. This is the case of the F3 and F5 factors, in which β result as statistically significant-0.258 and 0.149, respectivelyand thus, they are not considering the information of actual inflation when estimated. Notice, however, that this evaluation of factors is made to identify the characteristics of the group that better forecast inflation and not entirely of forecast accuracy, in the sense that exploring more with these factors-particularly with these with a good behaviour-could deliver the best results after several statistical treatments and factor combinations, a task left for further research.
The third exercise, which encompasses the results of bias when present, consists in asking the degree to which forecast variables are forward-looking versus backward-looking. A static version of this test is represented with the regression: where λ and γ are parameters to be estimated, and υ τ is an error term that could be autocorrelated, in whose case a 3-term moving average is used. Thus, as the γ-parameters are constrained to add to unity, a relatively greater γ parameter reflects a higher degree of forward-lookingness of the factor, which is desirable for a forecasting variable. As Łyziak (2013) posits, a γ = 1 parameter suggests that inflation forecasts are fully forward-looking and meet the requirement that the rational expectations hypothesis be unbiased. In contrast, if γ is not statistically different from zero, inflation forecasts are fully backward-looking, being very easy to outperform and, thus, providing very little informational content. The results are presented in Table 6. In this sense, F1 and F5 factors come out as the best options showing a share close to 50% of forward-lookingness. Remarkably, neither of the estimated coefficients comes out significant for F7 and F8 factors. This result is unexpected given their satisfactory results in the rest of the exercises. It is argued that, so far, F1 (and recalling that it is the best factor according to the RMSFE ratio) and F5 are the factors with the desirable features of a forecasting variable, and that the only difference in their composition is that while F1 is compounded by women, F5 is compounded by men.
Much empirical research has been conducted on the Phillips curve regarding the fourth exercise. Particularly after the proposal of Galí and Gertler (1999) of the hybrid New Keynesian Phillips curve including direct measures of inflation expectations, many authors have fit this type of Phillips curve for forecasting purposes. 10 The success of this version of the Phillips curve relies on the mixture of two features of price dynamics, namely, its persistence, captured with lagged inflation, and the prospective price formation by firms captured by direct measures of expectations. This is added to a cost-push measure such as the output gap, reflecting inflationary pressures from the real economy. Thus, if inflation expectations act as so, they must be statistically significant in this setup, as argued in Łyziak (2013). Thus, the fourth exercise consists in using each promising inflation factor in a hybrid New Keynesian Phillips curve described as: 10 See, for instance, Paloviita and Mayes (2005) for 11 European countries, Nason and Smith (2008)   whereỹ τ is the output gap, 11 π, ρ, and φ are parameters to be estimated, and ψ τ is an error term that could be autocorrelated, in whose case a 3-term moving average is used. The criteria to determine which inflation factor is preferable are based on both the statistical significance of the associated coefficient and the improvements in the model's goodness-of-fit. A third criterion is based on the 4-quarter-ahead forecast obtained with each estimated version of equation (6) Table 7. It is observed that all factors display statistically significant results and, thus, the discriminatory power seems low. However, the differences are more noticeable when making predictions with these estimates (actual end-of-period inflation four quarters ahead). In particular, F2 and F7 factors display the lowest RMSFE, followed by the F3 and F5 factors, and then F4 and F1 factors. Thus, the hybrid New Keynesian Phillips curves based on neither F1 nor F5 are on the top three of inflation forecasts, with the caveat that no statistical inference is carried out between them. These results could be explained by a slightly greater output gap coefficient allowed by the factors leading to the best forecasting results and the difficulty of 11 The approximation used here for the output gap is obtained as the difference between the logarithmic level of actual GDP and potential GDP. The latter is defined as the logarithmic level of the GDP's seasonally adjusted and filtered version including up to five years of forecast observations coming from an ad-hoc autoregressive integrated moving average (ARIMA) model. This last step is performed to avoid the "end-of-sample" identification problem when using the Hodrick-Prescott (λ =1,600) method to filter the series. The seasonal adjustment program used is the X-13ARIMA-SEATS, whereas the ARIMA forecasting model is the so-called airline model (Box and Jenkins, 1970;Ghysels et al., 2006).
estimating coefficients with covariates more correlated between them.
Finally, the fifth exercise consists of the so-called U-Theil decomposition of forecast errors in bias, regression, and disturbance proportions. This decomposition aims to disentangle and provide a taxonomy of forecast errors aiming to determine their sources. Consequently, depending on the source, an improvement strategy could be deployed. As Ahlburg (1984) states, following Theil (1971), the decomposition of the mean squared forecast error (MSFE) is: where π τ +4 τ and π τ +4 are the means of the predicted and actual values of inflation series, respectively, whereas σ π τ +4 τ and σ π τ +4 are their standard deviations, and ρ is the correlation coefficient between the predicted and actual values of the series. The bias proportion arises from the systematic under-or over-estimation of the mean of the target variable. In contrast, the regression proportion is due to the slope coefficient obtained of the relationship between the actual value of the series and its forecast value. These two sources of forecasting errors are systematic and could be reduced to improve accuracy. In contrast, the disturbance proportion is the non-systematic error, not allowing a role for the forecaster. To ease comparison between these three components, they are re-scaled to the MSFE to be now comparable across inflation factors. A factor with the highest disturbance proportions is, thus, a signal that forecasters are more efficient in the use of information. Notice that this evaluation is made with inflation factors obtained as they are, without any correction or combination. This is relevant as some factors that exhibit a worse performance than the RW in the first step, in an eventual second step of correction, may outperform some factors with promising initial accuracy. This kind of analysisa data mining with forecasting purposes-while useful certainly goes beyond the scope of this article. Also, a combination or accuracy enhancing technique will blur the true contribution of a specific group to forecast accuracy, being difficult to identify.
The results are presented in Table 8. As mentioned above, a greater share of the disturbance proportion is desirable from the standpoint of this evaluation, as it aims to reveal how efficient the respondents are in the use of information and expectation formation without any ex-post statistical intervention. Consequently, the F1 factor comes out as the best alternative because it shows the largest share of disturbance and, by construction, the smallest share of bias and noise due to the regression. In these terms, the F5 factor has the best options to be corrected as displaying the highest regression proportion.
A final in-depth analysis of F1 and F5 factors consists in hand-picking observations in which they both perform poorly and transform them into missing observation. On the one hand, for the F1 factor, an observation with a greater deviation is found in 2009.IV (showing 6.0% when the actual inflation was -2.1%). Dropping this observation reduces the RMSFE ratio from 0.811 to 0.807. On the other hand, for the F5 factor, two major deviations are noticed; one in 2009.IV (6.0% versus -2.1% actual) and another in 2010.IV (7.5% versus 2.0% actual). Dropping these observations implies dropping the RMSFE ratio from 0.902 to 0.878-thus, a fall in the RMSFE ratio is not enough to outperform the F1 factor. Therefore, the F1 factor still stands as the best factor. Also, a remarkable fact is that during the 2008-09 Global Financial Crisis, the F1 factor recorded an inflation forecast of 10.1% while the effective rate was 9.9%. At the same time, the F5 factor registered a wider difference by recording a rate of 4.8%. 12 In sum, according to the analysis of complementary econometric exercises, the F1 factor (W -[35,65]-College-NEast-Retail ) is consistently the best in terms of accuracy and desired features expected from a forecasting variable. However, the F5 factor (M -[35,65]-College-NEast-Retail ), despite not being the second-best in terms of accuracy measured through the RMSFE ratio, comes out as a valid option fulfilling the behaviour of a forecasting variable.

Discussion and Directions for Further Research
In this section, I analyse three issues of interest in the light of results: (i) the extent to which they have common features with the international evidence, (ii) an explanation on why I use the mean instead of the median as the statistic that built inflation factors, and (iii) an exploratory analysis on the results of the less accurate factor when forecasting total inflation (M-[.,34]-Secondary-NWest-Retail, see Table 2), questioning which granular prices respondents may be targeting at when they are surveyed.
The first issue of this section is to analyse to what extent the characteristics of the best forecasters found in this paper relates to the international evidence. Three facts are worth mentioning: (i) the results of this paper could be easily summarised as higher-income (due to its county of living) women aged between 35-65 years old with a college degree are the best forecasters within the survey (when using the whole sample), (ii) it is very common to find that higher-income households, men, more educated, and older respondents are the best forecasters when analysing international surveys, and (iii) it is very uncommon to find women as the best forecasters. 12 A final exploratory check is regarding forecast combinations. An exercise combining both F1 and F5 factors with linear weights adding to unity suggest that pairwise combinations do not improve the accuracy of the F1 factor. In particular, using the 9-ordered-pair grid ranging from 0.1 to 0.9 as weight for F1 (and, consequently, of 0.9 to 0.1 for F5 ) deliver RMSFE ratios of 0. 890, 0.874, 0.860, 0.849, 0.839, 0.831, 0.826, 0.823, and 0.823 which are all below the 0.811 RMSFE ratio exhibited by the F1 factor alone.
The evidence collected from the Survey of Consumers of the University of Michigan for the United States gives a good benchmark. It reports the personal characteristics of the respondents, such as gender, age, marital status, education, race, and income level. The findings generally suggest that white men with more than high-school educational level, within the 55-65 years-old range provide more accurate inflation forecast 12-month-ahead (Bryan and Venkatu, 2001a,b;Souleles, 2004;Pfajfar and Santoro, 2008;Meyer and Venkatu, 2011;Madeira and Zafar, 2015). The "race" variable is included in Bryan and Venkatu (2001a), Meyer and Venkatu (2011), Madeira and Zafar (2015), Axelrod et al. (2018) when using the Survey of Consumers, and Rossouw et al. (2011) analysing the South African case, playing a role in forecast accuracy. However, it is not captured by the IEE survey as Chile's Metropolitan Area had a relatively low race diversity when the survey's sampling was set. Nowadays, however, as the Central Bank of Chile (2018) points out, an immigration shock was experienced during 2015-17 in Chile which was of a substantial number of people (achieving 8.8% of the labour force according to the 2017 Census) and concentrated in the Metropolitan Area. Also, the origin countries and race of this immigration wave are more diverse and show a significant participation in the labour market (a rate of 77% registered in the March-May 2017 moving quarter). Thus, future sample updating of the IEE could include the race variable.
Also, Palmqvist and Strömberg (2004)  Interestingly, Sabrowski (2008) uses the Business and Consumer Survey conducted by the European Commission for Germany to find a statistically significant role of labour status (working/unemployed) when finding the best forecasting demographic group. This issue was also treated in Malgarini (2009) for the Italian case. Similarly, Ehrmann et al. (2017) analyse the role of respondents' financial situation using the Survey of Consumers, finding that more creditconstrained respondents tend to overestimate inflation rates. Goyal and Parab (2019a,b) include the assessment of the economic outlook of respondents when analysing consumers' inflation expectations for the case of India. In contrast, Ichiue and Nishiguchi (2015) analyse the role of asset holdings and financial literacy in the case of Japan. Diamond et al. (2020) add the type of contract of the respondent as determinant of inflation expectations, also for Japan.
The overwhelming majority of survey-based results on inflation expectations by demographic groups suggest men as the best forecasters when controlling for all mentioned variables. This is the main difference of the results of this paper with the international evidence. In general, women tend to overestimate inflation because they put an additional attention to prices with a higher volatility, overreacting when forming their inflation expectations. This hypothesis was firstly analysed in Jonung (1981) for the case of Sweden and proposed in Pfajfar and Santoro (2008) using the Survey of Consumers. However, in Chile's case, this fact seems to favour women when forming their inflation expectations. Three reasons could yield this advantage: women spend more time screening a major portion of CPI-basket item prices, or the Chilean total inflation is specifically driven by those items in which women pay more attention than men (i.e., food stuff), as suggested by Jonung (1981), or a combination of both reasons.
Regarding the first reason, there is some supporting evidence in the Chilean National Survey on Time Use (ENUT) conducted by the National Statistics Institute (2015), indicating that within the group of employed, men dedicate an average of 2.85 hours a day to "unpaid work" (a category that includes purchases of home stuff), while women allocate an average of 5.85 hours a day to these tasks (difference women minus men: +3 hours). This difference grows within the unemployed, in which men spend an average of 3.49 hours a day, while women 7.11 hours (difference: +3.62 hours), and a slightly larger difference (+3.69 hours) in inactive people, with men allocating 2.54 hours, while women 6.23 hours, which is almost equal to an office-based working time of 8 hours per working day.
This explanation is also in line with Reis's (2006a;2006b) claim, suggesting that different agents of the economy have different incentives to "rational in-attend" changes in prices. This could explain the difference across income levels and, added to the results regarding time use; it also could explain the difference between men and woman. It is important to remark that, for the case of this study, the sample available to construct women and "Housework"-employment situation-based inflation factor does not allow to perform a reliable econometric analysis. Thus, it is impossible within this framework to test "Working"-woman against "Housework"-woman forecasting accuracy without using a more complex sampling method, a task left for further research.
The second issue of this section relates to the use of the mean instead of the median of each group of respondents as the statistic used to build factors. Meyer and Venkatu (2011) suggest that the appropriate statistic to represent and compare demographic groups in inflation expectations, should be the median instead of the mean. However, a first important difference of this study with Meyer and Venkatu (2011) is the sampling method, sample span and representativeness, and the frequency in which the survey is conducted. As Meyer and Venkatu (2011) use the Survey of Consumers elaborated by the University of Michigan, they dispose of a minimum of 500 answers per month with samples designed to represent all American households. This excludes those in Alaska and Hawaii, which differs to the Chilean case analysed in this paper, referring to the Greater Metropolitan Area (and the CPI still being elaborated countrywide). Also, the question about inflation expectations 13 has been asked since the early 1980s, thus, covering a period of 30 years at the time of that study (2011). Instead, the Chilean data of Centro de Microdatos covers a period of 13 years on a quarterly basis. All these differences pose a titanic challenge to ensure the right representation of each characteristic that together compound an inflation factor. 14 Nevertheless, and even more relevant for the exercise carried out in this study, it is key that 13 "During the next 12 months, do you think that prices, in general, will go up, or go down, or stay where they are now?".
14 Kenny et al. (2015) propose a complete setup making use of panel estimates using density forecasts of respondents of the European Central Bank's Survey of Professional Forecasters at micro data level. This is an avenue to be explored for the case of Chile when a unique identificatory become available.
when more respondents fulfil the characteristics that define the factor, they can re-shape the distribution function and change the group's outcome: this occurs when using the mean instead of the median. This makes that any respondent that fulfils characteristics counts and, as lack of more information, any of them do that with the same weight 1 N (assuming randomness with N being the number of respondents of the factor). In contrast, when using the median, respondents could be irrelevant despite fulfilling the criteria to be part of the factor, contradicting the setup of this exercise. For instance, a factor could be compounded of N − 2 respondents, and two new respondents must be added to the factor. The factor displays a median of M N −2 upon using the N − 2 sample. Suppose the new entries are (M N −2 + ε 1 ) and In any case, the Table 1D in Appendix D display the results mimicking Table 2 but using the median instead of the mean. The results are sensitive to this change. A total of 14 (instead of 8) factors display an RMSFE ratio below unity, explained because of the insensitivity of the median to outliers. Yet, according to the Giacomini-White test, none of the factors comes out as statistically different from the RW. All of the eight factors that show a better performance than the RW using the mean are still better than the RW when using the median, except the M-[35,65]-College-SWest-Retail factor that is no longer better than the RW. This result highlights that excluding some respondents does not always lead to better results and choosing the point of the distribution function with the best forecasting results for a factor could be a more sophisticated task than calculating an automated statistic.
The third issue of this section consists of analysing the factor showing the most inaccurate performance when predicting total inflation (according to the RMFSE loss function) when forecasting the end-of-period headline inflation. The M-[.,34]-Secondary-NWest-Retail factor (see Table 2) shows an RMSFE ratio of 3.307. Thus, the question is whether the respondents with mentioned characteristics aimed at forecasting a specific price item or a reduced subset of the CPI basket. To that end, I use of the 2018 CPI basket data prices at the granular level constructed in Alvarado and Medel (2020) to calculate the correlation coefficient between mentioned factor and 303 items compounding the current CPI basket. An important feature is that the correlation is estimated contemporarily; without considering the time horizon in which the respondents give their answers and the horizon in which they are asked. This is so-and, thus, the used statistic is the correlation coefficient and not the RMSFE-because the respondents could easily think in "12-months ahead" as a vague future date without necessarily perfectly matching the months in which the questions are posed. This distinction is highly sensitive for the RMSFE calculation Table 9 CPI basket items with higher correlation ( 40% ) with the most inaccurate factor for total inflation. over the 303 items of the CPI basket. Therefore, the results are exploratory in nature using the contemporary correlation coefficient.
The results are shown in Table 9. These consist of all items with a correlation coefficient greater than 40% with the most inaccurate factor (first column), obtaining five out of 303 items fulfilling this criterion; 15 almost all of them classified as necessities: potatoes, canned vegetables, milk flavour, homeopathic medicines and food supplements, and furniture repair service. Also, the correlation between the M-[.,34]-Secondary-NWest-Retail factor and total inflation is -0.11% for the 2005.II-2018.IV (55 observations) sample, being almost unrelated between them in contemporaneous terms. To further delve into the target of the analysed factor, the items could be grouped in three sets: foodstuff, medicines, and home maintenance-which could be associated to a spending composition of lower-income households.
According to CLAPES-UC (2020), an estimation of the 2018-based CPI basket for the poorestincome quantile comes out with a greater weight for "Food and non-alcoholic beverages" of 26.34%, representing 7.04 basis points greater than the weight of the whole CPI basket. In contrast, "Health" represents 6.77% of the poorest-income quantile basket, -1 basis point less than that of the whole CPI basket, and "Home equipment and maintenance" comes out with 4.46% of the poorest-income quantile basket, corresponding to -2.07 basis points less than the weight in the whole basket. Note that a greater weight of food items is generally associated to poorer-income household spending. Also, as public health in Chile is free or provided with a subsidized low endprice for the poorer households, the CPI captures better private health prices that obey more to a market-oriented logic. Thus, it is not surprising that this category in the CPI less represent the poorest-income quantile. In addition, considering that the identified item corresponds to what is considered alternative medicine in Chile (homeopathic) rather than the conventional Western medicine, suggest cheaper treatment alternatives. Finally, it is indistinguishable from the label "Home equipment and maintenance", which share is associated to "maintenance", but the spending in "Furniture repair service" certainly suggest the use of a second-best alternative to new home furniture. In sum, there is some evidence suggesting that the M-[.,34]-Secondary-NWest-Retail factor is inaccurate when forecasting total inflation because its respondents target a subset of the whole CPI. However, they could be very accurate when forecasting a CPI basket more oriented to lower-income households. Also, this result does not rule out that respondents compounding this factor follow a similar price-expectation formation process than those respondents of the most accurate factor but with different targets, which deserve more in-depth research.

Concluding Remarks
This article evaluated quantitative inflation forecasts for the Chilean economy, taking advantage of a specific survey of consumer perceptions at the individual microdata level, which, at the same time, is linked to a survey of employment and unemployment of Chile's capital city, Santiago. Thus, the key advantage of the database is that it is possible to link, with no error, consumer perceptions and 12-month-ahead inflation forecasts with labour-market characteristics of the respondents, which heavily rely on personal (yet anonymous) characteristics. This is done by constructing sub-sets of inflation expectations factors with fully identifiable and mutually excluded characteristics such as gender, age, education, county of living, and the economic sector of present work.
By using a quarterly sample ranging from 2005.II to 2018.IV, the results for total inflation suggest that women aged between 35 and 65 years old, with a college degree, living in the Northeastern part of the city (that with the highest living standards of the country), and working in Community and Social Services sector are the best forecasters. Men aged between 35 and 65 years old, with a college degree, in a tie living in the South-eastern and North-eastern part of the city but working in Retail and Government and Financial Services sectors, respectively, are the second-best at forecasting inflation. Finally, men aged between 35 and 65 years old, with a college degree, living in the North-western part of the city, working on Government and Financial Services are in fourth place. From the fifth to eight places, I found men aged between 35 and 65 years old as a common characteristic and concentrated in Community and Social Services and Government and Financial Services, but with a different spatial distribution. These results are obtained by comparing mentioned inflation expectations factors to the naïve RW forecast. Only these eight out of 648 total possible factors outperform the RW, and none of them coming out as statistically superior according to the Giacomini and White (2006) test.
Several econometric exercises are also conducted to further discriminate between the best inflation factors, revealing that a factor different to the second-best in terms of forecast accuracy displays the characteristics required of a forecasting variable. Remarkably, this factor has the same specifications as the winner factor, with the only difference being that it is composed by men instead of women. Thus, potentially, there is space to delve into more intricate schemes to take full advantage of the predictive information of the overall survey-a task left for further research.
The results of this study are different to the general conclusions found in the international evidence, in the sense that women are better forecasters than men. All other comparable characteristics (higher income, higher educational level, and age range) align with the common findings. This led to think that, in Chile, either women pay more attention to a wider range of CPI-basket price items, or the Chilean total inflation is driven specifically by those items in which women form their expectations much better than men. Finally, the most inaccurate factor when predicting total inflation owes its result to the fact that it targets a subset of the CPI basket and focuses on a lower-income household spending. This does not necessarily suggest that they form their expectations in the wrong way or different from that of the winning factor.
It is important to remark that all this evaluation is made to reveal how efficient groups of respondents are in using information and inflation expectation formation, with neither any expost statistical intervention nor factor combinations. 16 The results presented in this article are important because they help to identify the most accurate group when forecasting inflation and, thus, help refine the information provided by the survey for inflation forecasting purposes.

Appendix A -Surveys of Inflation Expectations in Chile
In this appendix, I compare the Universidad de Chile's Centro de Microdatos inflation expectations average ("UChile") with seven surveys asking at the same horizon plus two at 24-months ahead, i.e., Chile's official monetary policy horizon, and including the actual total inflation series.
The comparison is made through the boxplots of Figure A.1. 17 Note that "UChile" is the survey with the worse performance when using all raw data in terms of point and dispersion-with the distinctive feature that is the only survey considered in this figure that is conducted in a quarterly basis, whereas remaining ones are conducted in a monthly or daily basis. Other consumer expectations are "IPEC" ("Índice de Percepción de la Economía" elaborated by Adimark ) whereas the remaining ones are answered by either experts or professional analysts. The best results at 12month horizon are obtained with "EEE:11" ("Encuesta de Expectativas Económicas", elaborated by the Central Bank of Chile), "BLMG" (Bloomberg survey), "CF:12" (Consensus Economics Notes: All data are transformed to annual variation in order to be compared to the 3% annual variation official inflation target. "Actual" stands for annual variation of total CPI inflation in monthly frequency. Sample: Jan-00/Jul-20 (247 observations). Source: National Statistics Institute. "EEE:11" and "EEE:23 plus "EOF:12" and "EOF:24" stands por "Encuesta de Expectativas Económicas" at 11-and 23-month horizon, and "Encuesta de Operadores Financieros" (prior to each Monetary Policy Meeting) at 12-and 24-month ahead; respectively. Sample: EEE: Sep-01/Jul-20 (227 observations); EOF: Dec-09/Jun-20 (127 observations). Source: Central Bank of Chile. "BLMG" stands for Bloomberg daily median inflation forecast, considering the last day of each month. Sample: Jan-08/Jul-20 (151 observations). Source: Bloomberg. "CF:12" corresponds to a 24-term weighted average between the "current year" and the "next year" horizons of the Consensus Forecast report to reflect a unique 12-month comparable horizon. Sample: Jan-00/Jul-20 (247 observations). Source: Consensus Economics. "UChile" stands for the Universidad de Chile's Centro de Microdatos Survey of Perception and Expectations on the Economic Situation in Greater Santiago quarterly data. Sample: 2005.II/2018.IV (55 observations). Source: Centro de Microdatos. "IPEC" stands for "Índice de Percepción de la Economía" (consumers) and the original index with 50 as a neutral value is extrapolated as 3% inflation rate. Sample: Mar-02/Jun-20 (220 observations). Source: Adimark. "IMCE" stands for "Indicador Mensual de Confianza Empresarial" (entrepreneurs) where "M" stands for "Manufacturing" and "R" for "Retail". Sample: May-05/Jun-20 (182 observations). Source: Instituto Chileno de Administración Racional de Empresas (ICARE) and Universidad Adolfo Ibáñez (UAI). 17 Note that Figure A.1 depicts survey-based inflation expectations only. However, it could be easily extended by making use of the information obtained from financial assets which are possible to extract the breakeven inflation rate. However, those expectations are of a different nature. The complete graph, however, including those expectations coming from financial assets, is available here: https://drive.google.com/file/ d/15qXSyOCncBg5SQ0RM-lJdBLW-87ZPZHF/view?usp=sharing.
survey, applying a weighting scheme to transform it from a moving horizon to a 12-month fixed horizon), and "EOF:12" ("Encuesta de Operadores Financieros" conducted by the Central Bank of Chile before each Monetary Policy Meeting). The last two boxplots display the EEE and EOF inflation expectations at 24-month ahead, with a little or virtually no variation from the inflation target of 3% within the sample. The "IMCE:M" and "IMCE:R" corresponds to the "Indicador Mensual de Confianza Empresarial" (entrepreneurs) elaborated by Instituto Chileno de Administración Racional de Empresas (ICARE) and Universidad Adolfo Ibáñez (UAI), where "M" stands for "Manufacturing" and "R" for "Retail". These two surveys are not as accurate as those responded by experts and professional analysts, but certainly are much better than those of "UChile". So, Figure A.1 puts into perspective the challenge to be addressed and the sense in search for the best forecasters within the "UChile" employment survey.

Appendix C -A Visual Comparison Between the Best Factors
In this appendix, I analyse the boxplots of the eight factors that outperform the RW (see Table 3) compared to the aggregate "UChile" factor and the actual total inflation series. The boxplots are shown in Figure C.1.
The first fact to note is that all factors display an enhanced forecast accuracy compared to the aggregate in terms of the mean point, whereas some of them (e.g., F6, F7, and F8 ) are worst in terms of dispersion, i.e., a major number of outliers. Remarkably, F1 display just a few outliers, similarly to F4 and F5, but F4 with a greater interquartile range. Also, the median of F1, F3, and F5 are the closest to the 3% target, but F3 and F5 display more outliers than F1, because they have a tighter interquartile range. Thus, F1, F3, and F5 factors comes out as the most promising factors within the survey and deserving more analyses delving into its differences.

Percentage Percentage
Notes: All data show the CPI inflation annual variation. "Actual" stands for annual variation of total inflation in monthly frequency. Sample: Jan-00/Jul-20 (247 observations). "UChile" stands for the Universidad de Chile's Centro de Microdatos Survey of Perception and Expectations on the Economic Situation in Greater Santiago quarterly data. Sample: 2005.II-2018.IV (55 observations). "F1" to "F8" stands for the inflation factors that outperform the RW-see Table 3 Source: Author's calculations based on National Statistics Institute and Universidad de Chile's Centro de Microdatos database. Appendix D -RMSFE Ratio Results Using the Median Factor