research findings
Filters
搜索

Chang, J., Qiu, Y., Yao, Q., & Zou, T. (2018). Confidence regions for entries of a large precision matrix. Journal of Econometrics, 206, 57-82.

We consider the statistical inference for high-dimensional precision matrices. Specifically, we propose a data-driven procedure for constructing a class of simultaneous confidence regions for a subset of the entries of a large precision matrix. The confidence regions can be applied to test for specific structures of a precision matrix, and to recover its nonzero components. We first construct an estimator for the precision matrix via penalized node- wise regression. We then develop the Gaussian approximation to approximate the distribu- tion of the maximum difference between the estimated and the true precision coefficients. A computationally feasible parametric bootstrap algorithm is developed to implement the proposed procedure. The theoretical justification is established under the setting which allows temporal dependence among observations. Therefore the proposed procedure is applicable to both independent and identically distributed data and time series data. Numerical results with both simulated and real data confirm the good performance of the proposed method.

Read More »

Chang, J., Delaigle, A., Hall, P., & Tang, C. Y. (2018). A frequency domain analysis of the error distribution from noisy high-frequency data. Biometrika, 105, 353-369.

Data observed at a high sampling frequency are typically assumed to be an additive composite of a relatively slow-varying continuous-time component, a latent stochastic process or smooth random function, and measurement error. Supposing that the latent component is an Itô diffusion process, we propose to estimate the measurement error density function by applying a deconvolu- tion technique with appropriate localization. Our estimator, which does not require equally-spaced observed times, is consistent and minimax rate-optimal. We also investigate estimators of the moments of the error distribution and their properties, propose a frequency domain estimator for the integrated volatility of the underlying stochastic process, and show that it achieves the optimal convergence rate. Simulations and an application to real data validate our analysis.

Read More »

Chang, J., Guo, J., & Tang, C. Y. (2018). Peter Hall’s contribution to empirical likelihood. Statistica Sinica, 28, 2375-2387.

We deeply mourn the loss of Peter Hall. Peter was the premier mathematical statistician of his era. His work illuminated many aspects of statistical thought. While his body of work on bootstrap and nonparametric smoothing is widely known and appreciated, less well known is his work in many other areas. In this article, we review Peter’s contribution to empirical likelihood (EL). Peter has done fundamental work on studying the coverage accuracy of confidence regions constructed with EL.

Read More »

He, J., & Chen, S. X. (2018). High-dimensional two-sample covariance matrix testing via super-diagonals. Statistica Sinica, 28, 2671-2696.

This paper considers testing for two-sample covariance matrices of high- dimensional populations. We formulate a multiple test procedure by comparing the super-diagonals of the covariance matrices. The asymptotic distributions of the test statistics are derived and the powers of individual tests are studied. The test statistics, by focusing on the super-diagonals, have smaller variation than the existing tests that target on the entire covariance matrix. The advantage of the proposed test is demonstrated by simulation studies, as well as an empirical study on a prostate cancer dataset.

Read More »

Chen, L., Guo, B., Huang, J., He, J., Wang, H., Zhang, S., & Chen, S. X. (2018). Assessing air-quality in Beijing-Tianjin-Hebei region: The method and mixed tales of PM2. 5 and O3. Atmospheric Environment, 193, 290-301.

Motivated by a need to evaluate the effectiveness of a campaign to alleviate the notorious air pollution in China’s Beijing-Tianjin-Hebei (BTH) region, we outline a temporal statistical adjustment method which is demonstrated from several aspects on its ability to remove the meteorological confounding existed in the air quality data. The adjustment makes the adjusted average concentration temporally comparable, and hence can be used to evaluate the effectiveness of the emission reduction strategies over time. By applying the method on four major pollutants from 73 air quality monitoring sites along with meteorological data, the adjusted averages indicate a substantial regional reduction from 2013 to 2016 in PM2.5 by 27% and SO2 by 51% benefited from the elimination of high energy consumption and high polluting equipments and a 20.7% decline of the coal consumption, while average NO2 levels had been static with a mere 4.5% decline. Our study also reveals a significant increase in the ground O3 by 11.3%. These suggests that future air quality management plans in BTH have to be based on dual targets of PM2.5 and O3.

Read More »

Chang, J., Zheng, C., Zhou, W. X., & Zhou, W. (2017). Simulation‐based hypothesis testing of high dimensional means under covariance heterogeneity. Biometrics, 73, 1300-1310.

In this article, we study the problem of testing the mean vectors of high dimensional data in both one-sample and two-sample cases. The proposed testing procedures employ maximum-type statistics and the parametric bootstrap techniques to compute the critical values. Different from the existing tests that heavily rely on the structural conditions on the unknown covariance matrices, the proposed tests allow general covariance structures of the data and therefore enjoy wide scope of applicability in practice. To enhance powers of the tests against sparse alternatives, we further propose two-step procedures with a preliminary feature screening step. Theoretical properties of the proposed tests are investigated. Through extensive numerical experiments on synthetic data sets and an human acute lymphoblastic leukemia gene expression data set, we illustrate the performance of the new tests and how they may provide assistance on detecting disease-associated gene-sets. The proposed methods have been implemented in an R-package HDtest and are available on CRAN.

Read More »

Chang, J., Zhou, W., Zhou, W. X., & Wang, L. (2017). Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering. Biometrics, 73, 31-41.

Comparing large covariance matrices has important applications in modern genomics, where scientists are often interested in understanding whether relationships (e.g., dependencies or co-regulations) among a large number of genes vary between different biological states. We propose a computationally fast procedure for testing the equality of two large covariance matrices when the dimensions of the covariance matrices are much larger than the sample sizes. A distinguishing feature of the new procedure is that it imposes no structural assumptions on the unknown covariance matrices. Hence, the test is robust with respect to various complex dependence structures that frequently arise in genomics. We prove that the proposed procedure is asymptotically valid under weak moment conditions. As an interesting application, we derive a new gene clustering algorithm which shares the same nice property of avoiding restrictive structural assumptions for high-dimensional genomics data. Using an asthma gene expression dataset, we illustrate how the new test helps compare the covariance matrices of the genes across different gene sets/pathways between the disease group and the control group, and how the gene clustering algorithm provides new insights on the way gene clustering patterns differ between the two groups. The proposed methods have been implemented in an R-package HDtest and are available on CRAN.

Read More »

Chang, J., Yao, Q., & Zhou, W. (2017). Testing for high-dimensional white noise using maximum cross-correlations. Biometrika, 104, 111-127.

We propose a new omnibus test for vector white noise using the maximum absolute auto- correlations and cross-correlations of the component series. Based on an approximation by the L∞-norm of a normal random vector, the critical value of the test can be evaluated by bootstrap- ping from a multivariate normal distribution. In contrast to the conventional white noise test, the new method is proved to be valid for testing departure from white noise that is not independent and identically distributed. We illustrate the accuracy and the power of the proposed test by simu- lation, which also shows that the new test outperforms several commonly used methods, including the Lagrange multiplier test and the multivariate Box–Pierce portmanteau tests, especially when the dimension of the time series is high in relation to the sample size. The numerical results also indicate that the performance of the new test can be further enhanced when it is applied to pre-transformed data obtained via the time series principal component analysis proposed by J. Chang, B. Guo and Q. Yao (arXiv:1410.2323). The proposed procedures have been implemented in an R package.

Read More »

Zhou, W., & Peng, Z. (2017). Asymptotic behavior of bivariate Gaussian powered extremes. Journal of Mathematical Analysis and Applications, 455, 923-938.

In this paper, joint asymptotics of powered maxima for a triangular array of bivariate Gaussian random vectors are considered. Under the Hüsler–Reiss condition, limiting distributions of powered maxima are derived. Furthermore, the second-order expansions of the joint distributions of powered maxima are established under the refined Hüsler–Reiss condition.

Read More »

Li, C., & Su, L. (2017). Extracting harmonic signal from a chaotic background with local linear model. Mechanical Systems and Signal Processing, 84, 499-515.

In this paper, the problems of blind detection and estimation of harmonic signal in strong chaotic background are analyzed, and new methods by using local linear (LL) model are put forward. The LL model has been exhaustively researched and successfully applied for fitting and forecasting chaotic signal in many chaotic fields. We enlarge the modeling capacity substantially. Firstly, we can predict the short-term chaotic signal and obtain the fitting error based on the LL model. Then we detect the frequencies from the fitting error by periodogram, a property on the fitting error is proposed which has not been addressed before, and this property ensures that the detected frequencies are similar to that of harmonic signal. Secondly, we establish a two-layer LL model to estimate the determinate harmonic signal in strong chaotic background. To estimate this simply and effectively, we develop an efficient backfitting algorithm to select and optimize the parameters that are hard to be exhaustively searched for. In the method, based on sensitivity to initial value of chaos motion, the minimum fitting error criterion is used as the objective function to get the estimation of the parameters of the two-layer LL model. Simulation shows that the two-layer LL model and its estimation technique have appreciable flexibility to model the determinate harmonic signal in different chaotic backgrounds (Lorenz, Henon and Mackey–Glass (M–G) equations). Specifically, the harmonic signal can be extracted well with low SNR and the developed background algorithm satisfies the condition of convergence in repeated 3–5 times.

Read More »