全部成果 – 第 4 页 – 常晋源团队

张佳, & Chen, X. (2019). Robust sufficient dimension reduction via ball covariance. Computational Statistics & Data Analysis, 140, 144-154.

Sufficient dimension reduction is an important branch of dimension reduction, which includes variable selection and projection methods. Most of the sufficient dimension reduction methods are sensitive to outliers and heavy-tailed predictors, and require strict restrictions on the predictors and the response. In order to widen the applicability of sufficient dimension reduction, we propose BCov-SDR, a novel sufficient dimension reduction approach that is based on a recently developed dependence measure: ball covariance. Compared with other popular sufficient dimension reduction methods, our approach requires rather mild conditions on the predictors and the response, and is robust to outliers or heavy-tailed distributions. BCov-SDR does not require the specification of a forward regression model and allows for discrete or categorical predictors and multivariate response. The consistency of the BCov-SDR estimator of the central subspace is obtained without imposing any moment conditions on the predictors. Simulations and real data studies illustrate the applicability and versatility of our proposed method.

鲁万波, 陈骋, & 王建业. (2019). 资产组合非等间隔日内在险价值研究. 数理统计与管理, 38, 1104-1118.

当前对资产组合在险价值（VaR）的研究仅限于等间隔抽样数据的建模。本文提出资产组合的非等间隔日内在险价值（Irregularly Spaced Intraday Value at Risk, ISIVaR）研究方法，克服资产组合逐笔交易数据非等间隔且不同步问题，利用逐笔交易数据所包含的丰富市场微观结构信息对VaR进行估计。该方法基于更新时间方法将非同步的资产组合标值序列同步化；运用Copula理论建立资产组合的非等间隔日内波动率模型，并捕捉资产组合中各资产在截面上的相关关系；最后利用这种截面相关关系，使用蒙特卡洛模拟技术估计出资产组合的 ISIVaR。实证部分利用真实的逐笔交易数据验证了上述方法的有效性。

常晋源, Tang, C. Y., & Wu, T. T. (2018). A new scope of penalized empirical likelihood with high-dimensional estimating equations. Annals of Statistics, 46, 3185-3216.

Statistical methods with empirical likelihood (EL) are appealing and ef- fective especially in conjunction with estimating equations for flexibly and adaptively incorporating data information. It is known that EL approaches en- counter difficulties when dealing with high-dimensional problems. To over- come the challenges, we begin our study with investigating high-dimensional EL from a new scope targeting at high-dimensional sparse model parame- ters. We show that the new scope provides an opportunity for relaxing the stringent requirement on the dimensionality of the model parameters. Moti- vated by the new scope, we then propose a new penalized EL by applying two penalty functions respectively regularizing the model parameters and the associated Lagrange multiplier in the optimizations of EL. By penalizing the Lagrange multiplier to encourage its sparsity, a drastic dimension reduction in the number of estimating equations can be achieved. Most attractively, such a reduction in dimensionality of estimating equations can be viewed as a selection among those high-dimensional estimating equations, resulting in a highly parsimonious and effective device for estimating high-dimensional sparse model parameters. Allowing both the dimensionalities of model pa- rameters and estimating equations growing exponentially with the sample size, our theory demonstrates that our new penalized EL estimator is sparse and consistent with asymptotically normally distributed nonzero components. Numerical simulations and a real data analysis show that the proposed penal- ized EL works promisingly.

常晋源, Guo, B., & Yao, Q. (2018). Principal component analysis for second-order stationary vector time series. Annals of Statistics, 46, 2094-2124.

We extend the principal component analysis (PCA) to second-order sta- tionary vector time series in the sense that we seek for a contemporaneous linear transformation for a p-variate time series such that the transformed series is segmented into several lower-dimensional subseries, and those sub- series are uncorrelated with each other both contemporaneously and serially. Therefore, those lower-dimensional series can be analyzed separately as far as the linear dynamic structure is concerned. Technically, it boils down to an eigenanalysis for a positive definite matrix. When p is large, an additional step is required to perform a permutation in terms of either maximum cross- correlations or FDR based on multiple tests. The asymptotic theory is es- tablished for both fixed p and diverging p when the sample size n tends to infinity. Numerical experiments with both simulated and real data sets indi- cate that the proposed method is an effective initial step in analyzing multiple time series data, which leads to substantial dimension reduction in modelling and forecasting high-dimensional linear dynamical structures. Unlike PCA for independent data, there is no guarantee that the required linear transforma- tion exists. When it does not, the proposed method provides an approximate segmentation which leads to the advantages in, for example, forecasting for future values. The method can also be adapted to segment multiple volatility processes.

常晋源, Qiu, Y., Yao, Q., & Zou, T. (2018). Confidence regions for entries of a large precision matrix. Journal of Econometrics, 206, 57-82.

We consider the statistical inference for high-dimensional precision matrices. Specifically, we propose a data-driven procedure for constructing a class of simultaneous confidence regions for a subset of the entries of a large precision matrix. The confidence regions can be applied to test for specific structures of a precision matrix, and to recover its nonzero components. We first construct an estimator for the precision matrix via penalized node- wise regression. We then develop the Gaussian approximation to approximate the distribu- tion of the maximum difference between the estimated and the true precision coefficients. A computationally feasible parametric bootstrap algorithm is developed to implement the proposed procedure. The theoretical justification is established under the setting which allows temporal dependence among observations. Therefore the proposed procedure is applicable to both independent and identically distributed data and time series data. Numerical results with both simulated and real data confirm the good performance of the proposed method.

常晋源, Delaigle, A., Hall, P., & Tang, C. Y. (2018). A frequency domain analysis of the error distribution from noisy high-frequency data. Biometrika, 105, 353-369.

Data observed at a high sampling frequency are typically assumed to be an additive composite of a relatively slow-varying continuous-time component, a latent stochastic process or smooth random function, and measurement error. Supposing that the latent component is an Itô diffusion process, we propose to estimate the measurement error density function by applying a deconvolu- tion technique with appropriate localization. Our estimator, which does not require equally-spaced observed times, is consistent and minimax rate-optimal. We also investigate estimators of the moments of the error distribution and their properties, propose a frequency domain estimator for the integrated volatility of the underlying stochastic process, and show that it achieves the optimal convergence rate. Simulations and an application to real data validate our analysis.

常晋源, Guo, J., & Tang, C. Y. (2018). Peter Hall’s contribution to empirical likelihood. Statistica Sinica, 28, 2375-2387.

We deeply mourn the loss of Peter Hall. Peter was the premier mathematical statistician of his era. His work illuminated many aspects of statistical thought. While his body of work on bootstrap and nonparametric smoothing is widely known and appreciated, less well known is his work in many other areas. In this article, we review Peter’s contribution to empirical likelihood (EL). Peter has done fundamental work on studying the coverage accuracy of confidence regions constructed with EL.

何婧, & Chen, S. X. (2018). High-dimensional two-sample covariance matrix testing via super-diagonals. Statistica Sinica, 28, 2671-2696.

This paper considers testing for two-sample covariance matrices of high- dimensional populations. We formulate a multiple test procedure by comparing the super-diagonals of the covariance matrices. The asymptotic distributions of the test statistics are derived and the powers of individual tests are studied. The test statistics, by focusing on the super-diagonals, have smaller variation than the existing tests that target on the entire covariance matrix. The advantage of the proposed test is demonstrated by simulation studies, as well as an empirical study on a prostate cancer dataset.

陈磊, Guo, B., Huang, J., 何婧, Wang, H., Zhang, S., & Chen, S. X. (2018). Assessing air-quality in Beijing-Tianjin-Hebei region: The method and mixed tales of PM2. 5 and O3. Atmospheric Environment, 193, 290-301.

Motivated by a need to evaluate the effectiveness of a campaign to alleviate the notorious air pollution in China’s Beijing-Tianjin-Hebei (BTH) region, we outline a temporal statistical adjustment method which is demonstrated from several aspects on its ability to remove the meteorological confounding existed in the air quality data. The adjustment makes the adjusted average concentration temporally comparable, and hence can be used to evaluate the effectiveness of the emission reduction strategies over time. By applying the method on four major pollutants from 73 air quality monitoring sites along with meteorological data, the adjusted averages indicate a substantial regional reduction from 2013 to 2016 in PM2.5 by 27% and SO2 by 51% benefited from the elimination of high energy consumption and high polluting equipments and a 20.7% decline of the coal consumption, while average NO2 levels had been static with a mere 4.5% decline. Our study also reveals a significant increase in the ground O3 by 11.3%. These suggests that future air quality management plans in BTH have to be based on dual targets of PM2.5 and O3.

常晋源, Zheng, C., Zhou, W. X., & Zhou, W. (2017). Simulation‐based hypothesis testing of high dimensional means under covariance heterogeneity. Biometrics, 73, 1300-1310.

In this article, we study the problem of testing the mean vectors of high dimensional data in both one-sample and two-sample cases. The proposed testing procedures employ maximum-type statistics and the parametric bootstrap techniques to compute the critical values. Different from the existing tests that heavily rely on the structural conditions on the unknown covariance matrices, the proposed tests allow general covariance structures of the data and therefore enjoy wide scope of applicability in practice. To enhance powers of the tests against sparse alternatives, we further propose two-step procedures with a preliminary feature screening step. Theoretical properties of the proposed tests are investigated. Through extensive numerical experiments on synthetic data sets and an human acute lymphoblastic leukemia gene expression data set, we illustrate the performance of the new tests and how they may provide assistance on detecting disease-associated gene-sets. The proposed methods have been implemented in an R-package HDtest and are available on CRAN.

研究成果

菜单导航