研究成果

常晋源, Guo, J., & Tang, C. Y. (2018). Peter Hall’s contribution to empirical likelihood. Statistica Sinica, 28, 2375-2387.

We deeply mourn the loss of Peter Hall. Peter was the premier mathematical statistician of his era. His work illuminated many aspects of statistical thought. While his body of work on bootstrap and nonparametric smoothing is widely known and appreciated, less well known is his work in many other areas. In this article, we review Peter’s contribution to empirical likelihood (EL). Peter has done fundamental work on studying the coverage accuracy of confidence regions constructed with EL.

何婧, & Chen, S. X. (2018). High-dimensional two-sample covariance matrix testing via super-diagonals. Statistica Sinica, 28, 2671-2696.

This paper considers testing for two-sample covariance matrices of high- dimensional populations. We formulate a multiple test procedure by comparing the super-diagonals of the covariance matrices. The asymptotic distributions of the test statistics are derived and the powers of individual tests are studied. The test statistics, by focusing on the super-diagonals, have smaller variation than the existing tests that target on the entire covariance matrix. The advantage of the proposed test is demonstrated by simulation studies, as well as an empirical study on a prostate cancer dataset.

陈磊, Guo, B., Huang, J., 何婧, Wang, H., Zhang, S., & Chen, S. X. (2018). Assessing air-quality in Beijing-Tianjin-Hebei region: The method and mixed tales of PM2. 5 and O3. Atmospheric Environment, 193, 290-301.

Motivated by a need to evaluate the effectiveness of a campaign to alleviate the notorious air pollution in China’s Beijing-Tianjin-Hebei (BTH) region, we outline a temporal statistical adjustment method which is demonstrated from several aspects on its ability to remove the meteorological confounding existed in the air quality data. The adjustment makes the adjusted average concentration temporally comparable, and hence can be used to evaluate the effectiveness of the emission reduction strategies over time. By applying the method on four major pollutants from 73 air quality monitoring sites along with meteorological data, the adjusted averages indicate a substantial regional reduction from 2013 to 2016 in PM2.5 by 27% and SO2 by 51% benefited from the elimination of high energy consumption and high polluting equipments and a 20.7% decline of the coal consumption, while average NO2 levels had been static with a mere 4.5% decline. Our study also reveals a significant increase in the ground O3 by 11.3%. These suggests that future air quality management plans in BTH have to be based on dual targets of PM2.5 and O3.

常晋源, Zheng, C., Zhou, W. X., & Zhou, W. (2017). Simulation‐based hypothesis testing of high dimensional means under covariance heterogeneity. Biometrics, 73, 1300-1310.

In this article, we study the problem of testing the mean vectors of high dimensional data in both one-sample and two-sample cases. The proposed testing procedures employ maximum-type statistics and the parametric bootstrap techniques to compute the critical values. Different from the existing tests that heavily rely on the structural conditions on the unknown covariance matrices, the proposed tests allow general covariance structures of the data and therefore enjoy wide scope of applicability in practice. To enhance powers of the tests against sparse alternatives, we further propose two-step procedures with a preliminary feature screening step. Theoretical properties of the proposed tests are investigated. Through extensive numerical experiments on synthetic data sets and an human acute lymphoblastic leukemia gene expression data set, we illustrate the performance of the new tests and how they may provide assistance on detecting disease-associated gene-sets. The proposed methods have been implemented in an R-package HDtest and are available on CRAN.

常晋源, Zhou, W., Zhou, W. X., & Wang, L. (2017). Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering. Biometrics, 73, 31-41.

Comparing large covariance matrices has important applications in modern genomics, where scientists are often interested in understanding whether relationships (e.g., dependencies or co-regulations) among a large number of genes vary between different biological states. We propose a computationally fast procedure for testing the equality of two large covariance matrices when the dimensions of the covariance matrices are much larger than the sample sizes. A distinguishing feature of the new procedure is that it imposes no structural assumptions on the unknown covariance matrices. Hence, the test is robust with respect to various complex dependence structures that frequently arise in genomics. We prove that the proposed procedure is asymptotically valid under weak moment conditions. As an interesting application, we derive a new gene clustering algorithm which shares the same nice property of avoiding restrictive structural assumptions for high-dimensional genomics data. Using an asthma gene expression dataset, we illustrate how the new test helps compare the covariance matrices of the genes across different gene sets/pathways between the disease group and the control group, and how the gene clustering algorithm provides new insights on the way gene clustering patterns differ between the two groups. The proposed methods have been implemented in an R-package HDtest and are available on CRAN.

常晋源, Yao, Q., & Zhou, W. (2017). Testing for high-dimensional white noise using maximum cross-correlations. Biometrika, 104, 111-127.

We propose a new omnibus test for vector white noise using the maximum absolute auto- correlations and cross-correlations of the component series. Based on an approximation by the L∞-norm of a normal random vector, the critical value of the test can be evaluated by bootstrap- ping from a multivariate normal distribution. In contrast to the conventional white noise test, the new method is proved to be valid for testing departure from white noise that is not independent and identically distributed. We illustrate the accuracy and the power of the proposed test by simu- lation, which also shows that the new test outperforms several commonly used methods, including the Lagrange multiplier test and the multivariate Box–Pierce portmanteau tests, especially when the dimension of the time series is high in relation to the sample size. The numerical results also indicate that the performance of the new test can be further enhanced when it is applied to pre-transformed data obtained via the time series principal component analysis proposed by J. Chang, B. Guo and Q. Yao (arXiv:1410.2323). The proposed procedures have been implemented in an R package.

周玮 & Peng, Z. (2017). Asymptotic behavior of bivariate Gaussian powered extremes. Journal of Mathematical Analysis and Applications, 455, 923-938.

In this paper, joint asymptotics of powered maxima for a triangular array of bivariate Gaussian random vectors are considered. Under the Hüsler–Reiss condition, limiting distributions of powered maxima are derived. Furthermore, the second-order expansions of the joint distributions of powered maxima are established under the refined Hüsler–Reiss condition.

Zhang, S., Guo, B., Dong, A., 何婧, Xu, Z., & Chen, S. X. (2017). Cautionary tales on air-quality improvement in Beijing. Proceedings of the Royal Society A, 473, 20170457.

The official air-quality statistic reported that Beijing had a 9.9% decline in the annual concentration of PM2.5 in 2016. While this statistic offered some relief for the inhabitants of the capital, we present several analyses based on Beijing’s PM2.5 data of the past 4 years at 36 monitoring sites along with meteorological data of the past 7 years. The analyses reveal the air pollution situation in 2016 was not as rosy as the 9.9% decline would convey, and improvement if any was rather uncertain. The paper also provides an assessment on the city’s PM2.5 situation in the past 4 years.

常晋源, Shao, Q. M., & Zhou, W. X. (2016). Cramér-type moderate deviations for Studentized two-sample U-statistics with applications. The Annals of Statistics, 44, 1931-1956.

Two-sample U -statistics are widely used in a broad range of applications, including those in the fields of biostatistics and econometrics. In this paper, we establish sharp Cramér-type moderate deviation theorems for Studentized two-sample U-statistics in a general framework, including the two-sample t-statistic and Studentized Mann–Whitney test statistic as prototypical exam- ples. In particular, a refined moderate deviation theorem with second-order accuracy is established for the two-sample t-statistic. These results extend the applicability of the existing statistical methodologies from the one-sample t-statistic to more general nonlinear statistics. Applications to two-sample large-scale multiple testing problems with false discovery rate control and the regularized bootstrap method are also discussed.

常晋源, Tang, C. Y., & Wu, Y. (2016). Local independence feature screening for nonparametric and semiparametric models by marginal empirical likelihood. Annals of statistics, 44, 515.

We consider an independence feature screening technique for identifying explanatory variables that locally contribute to the response variable in high- dimensional regression analysis. Without requiring a specific parametric form of the underlying data model, our approach accommodates a wide spectrum of nonparametric and semiparametric model families. To detect the local con- tributions of explanatory variables, our approach constructs empirical likeli- hood locally in conjunction with marginal nonparametric regressions. Since our approach actually requires no estimation, it is advantageous in scenarios such as the single-index models where even specification and identification of a marginal model is an issue. By automatically incorporating the level of variation of the nonparametric regression and directly assessing the strength of data evidence supporting local contribution from each explanatory vari- able, our approach provides a unique perspective for solving feature screen- ing problems. Theoretical analysis shows that our approach can handle data dimensionality growing exponentially with the sample size. With extensive theoretical illustrations and numerical examples, we show that the local inde- pendence screening approach performs promisingly.

菜单导航