Can we trust PCA on non-stationary Data?
Date: 13 August 2020, Thursday
Time: 10am
Speaker: Dr Yanrong Yang (ANU)
Abstract:
This paper establishes asymptotic properties for spiked empirical eigenvalues for high dimensional data with both cross-sectional dependence and dependent sample structure. A new finding from the established theoretical results is that spiked empirical eigenvalues will reflect dependent sample structure instead of cross-sectional structure under some scenarios, which indicates that principal component analysis (PCA) may provide inaccurate inference for cross-sectional structure. An illustrated example is provided to show that some commonly used statistics based on spiked empirical eigenvalues mis-estimate the true number of common factors. As an application on high dimensional time series, we propose a test statistic to distinguish unit root from factor structure, and demonstrate its effective finite sample performance on simulated data. Our results are then applied to analyse OECD health care expenditure data and US mortality data, both of which possess cross-sectional dependence as well as non-stationary temporal dependence. It is worth mentioning that we contribute to statistical justification for the benchmark paper by Lee and Carter (1992) in mortality forecasting.
Link: https://anu.zoom.us/j/425258947
Date: 13 August 2020, Thursday
Time: 10am
Speaker: Dr Yanrong Yang (ANU)
Abstract:
This paper establishes asymptotic properties for spiked empirical eigenvalues for high dimensional data with both cross-sectional dependence and dependent sample structure. A new finding from the established theoretical results is that spiked empirical eigenvalues will reflect dependent sample structure instead of cross-sectional structure under some scenarios, which indicates that principal component analysis (PCA) may provide inaccurate inference for cross-sectional structure. An illustrated example is provided to show that some commonly used statistics based on spiked empirical eigenvalues mis-estimate the true number of common factors. As an application on high dimensional time series, we propose a test statistic to distinguish unit root from factor structure, and demonstrate its effective finite sample performance on simulated data. Our results are then applied to analyse OECD health care expenditure data and US mortality data, both of which possess cross-sectional dependence as well as non-stationary temporal dependence. It is worth mentioning that we contribute to statistical justification for the benchmark paper by Lee and Carter (1992) in mortality forecasting.
Video: