research findings
Filters
搜索

Chang, J., Fang, Q., Qiao, X., & Yao, Q. (2024+). On the modeling and prediction of high-dimensional functional time series. Journal of the American Statistical Association, in press.

We propose a two-step procedure to model and predict high-dimensional functional time series, where the number of function-valued time series p is large in relation to the length of time series n. Our first step performs an eigenanalysis of a positive definite matrix, which leads to a one-to-one linear transformation for the original high-dimensional functional time series, and the transformed curve series can be segmented into several groups such that any two subseries from any two different groups are uncorrelated both contemporaneously and serially. Consequently in our second step those groups are handled separately without the information loss on the overall linear dynamic structure. The second step is devoted to establishing a finite-dimensional dynamical structure for all the transformed functional time series within each group. Furthermore the finite-dimensional structure is represented by that of a vector time series. Modeling and forecasting for the original high-dimensional functional time series are realized via those for the vector time series in all the groups. We investigate the theoretical properties of our proposed methods, and illustrate the finite-sample performance through both extensive simulation and two real datasets. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.

Read More »

Chen, X., Deng, C., He, S., Wu, R., & Zhang, J. (2024). High-dimensional sparse single-index regression via Hilbert-Schmidt independence criterion. Statistics and Computing, 34, 86.

Hilbert-Schmidt Independence Criterion (HSIC) has recently been introduced to the field of single-index models to estimate the directions. Compared with other well-established methods, the HSIC based method requires relatively weak conditions. However, its performance has not yet been studied in the prevalent high-dimensional scenarios, where the number of covariates can be much larger than the sample size. In this article, based on HSIC, we propose to estimate the possibly sparse directions in the high-dimensional single-index models through a parameter reformulation. Our approach estimates the subspace of the direction directly and performs variable selection simultaneously. Due to the non-convexity of the objective function and the complexity of the constraints, a majorize-minimize algorithm together with the linearized alternating direction method of multipliers is developed to solve the optimization problem. Since it does not involve the inverse of the covariance matrix, the algorithm can naturally handle large p small n scenarios. Through extensive simulation studies and a real data analysis, we show that our proposal is efficient and effective in the high-dimensional settings. The Matlab codes for this method are available online.

Read More »

Chang, J., Hu, Q., Liu, C., & Tang, C. Y. (2024). Optimal covariance matrix estimation for high-dimensional noise in high-frequency data. Journal of Econometrics, 239, 105329.

We consider high-dimensional measurement errors with high-frequency data. Our objective is on recovering the high-dimensional cross-sectional covariance matrix of the random errors with optimality. In this problem, not all components of the random vector are observed at the same time and the measurement errors are latent variables, leading to major challenges besides high data dimensionality.

Read More »

Chang, J., Chen, C., Qiao, X., & Yao, Q. (2024). An autocovariance-based learning framework for high-dimensional functional time series. Journal of Econometrics, 239, 105385.

Many scientific and economic applications involve the statistical learning of high-dimensional functional time series, where the number of functional variables is comparable to, or even greater than, the number of serially dependent functional observations. In this paper, we model observed functional time series, which are subject to errors in the sense that each functional datum arises as the sum of two uncorrelated components, one dynamic and one white noise.

Read More »

Chang, J., Hu, Q., Kolaczyk, E. D., Yao, Q., & Yi, F. (2024). Edge differentially private estimation in the β-model via jittering and method of moments. Annals of Statistics, 52, 708-728.

A standing challenge in data privacy is the trade-off between the level of privacy and the efficiency of statistical inference. Here we conduct an in-depth study of this trade-off for parameter estimation in the β-model (Chatterjee, Diaconis and Sly, 2011) for edge differentially private network data released via jittering (Karwa, Krivitsky and Slavkovic´, 2017). Unlike most previous approaches based on maximum likelihood estimation for this network model, we proceed via method-of-moments. This choice facilitates our exploration of a substantially broader range of privacy levels – corresponding to stricter privacy – than has been to date. Over this new range we discover our proposed estimator for the parameters exhibits an interesting phase transition, with both its convergence rate and asymptotic variance following one of three different regimes of behavior depending on the level of privacy. Because identification of the operable regime is difficult if not impossible in practice, we devise a novel adaptive bootstrap procedure to construct uniform inference across different phases. In fact, leveraging this bootstrap we are able to provide for simultaneous inference of all parameters in the β-model (i.e., equal to the number of nodes), which, to our best knowledge, is the first result of its kind. Numerical experiments confirm the competitive and reliable finite sample performance of the proposed inference methods, next to a comparable maximum likelihood method, as well as significant advantages in terms of computational speed and memory.

Read More »

Chang, J., He, J., Kang, J., & Wu, M. (2024). Statistical inferences for complex dependence of multimodal imaging data. Journal of the American Statistical Association, 119, 1486-1499.

Statistical analysis of multimodal imaging data is a challenging task, since the data involves high-dimensionality, strong spatial correlations and complex data structures. In this article, we propose rigorous statistical testing procedures for making inferences on the complex dependence of multimodal imaging data. Motivated by the analysis of multi-task fMRI data in the Human Connectome Project (HCP) study, we particularly address three hypothesis testing problems: (a) testing independence among imaging modalities over brain regions, (b) testing independence between brain regions within imaging modalities, and (c)testing independence between brain regions across different modalities. Considering a general form for all the three tests, we develop a global testing procedure and a multiple testing procedure controlling the false discovery rate. We study theoretical properties of the proposed tests and develop a computationally efficient distributed algorithm. The proposed methods and theory are general and relevant for many statistical problems of testing independence structure among the components of high-dimensional random vectors with arbitrary dependence structures. We also illustrate our proposed methods via extensive simulations and analysis of five task fMRI contrast maps in the HCP study. Supplementary materials for this article are available online.

Read More »

Chang, J., Chen, X., & Wu, M. (2024). Central limit theorems for high dimensional dependent data. Bernoulli, 30, 712-742.

Motivated by statistical inference problems in high-dimensional time series data analysis, we first derive non- asymptotic error bounds for Gaussian approximations of sums of high-dimensional dependent random vectors on hyper-rectangles, simple convex sets and sparsely convex sets. We investigate the quantitative effect of temporal dependence on the rates of convergence to a Gaussian random vector over three different dependency frameworks (α-mixing, m-dependent, and physical dependence measure). In particular, we establish new error bounds under the α-mixing framework and derive faster rate over existing results under the physical dependence measure. To implement the proposed results in practical statistical inference problems, we also derive a data-driven parametric bootstrap procedure based on a kernel-type estimator for the long-run covariance matrices. The unified Gaussian and parametric bootstrap approximation results can be used to test mean vectors with combined l2 and l∞ type statistics, do change point detection, and construct confidence regions for covariance and precision matrices, all for time series data.

Read More »

Liu, S., Luo, J., Zhang, Y., Wang, H., Yu, Y., & Xu, Z. (2024). Efficient privacy-preserving Gaussian process via secure multi-party computation. Journal of Systems Architecture, 151, 103134.

Gaussian processes (GPs), known for their flexibility as non-parametric models, have been widely used inpractice involving sensitive data (e.g., healthcare, finance) from multiple sources. With the challenge of dataisolation and the need for high-performance models, how to jointly develop privacy-preserving GP for multipleparties has emerged as a crucial topic, In this paper, we propose a new privacy-preserving GP algorithm, namelyPP-GP, which employs secret sharing ($$) techniques, Specifically, we introduce a new ss-based exponentiationoperation (PP-Exp) through confusion correction and an SS-based matrix inversion operation (PP-Ml) basedon Cholesky decomposition. However, the advantages of the GP come with a great computational burden andspace cost. To further enhance the efficiency, we propose an efficient split learning framework for privacy.preserving GP, named Split-GP, which demonstrably improves performance on large-scale data. We leave theDrivate data-related and SMPC-hostile computations (i.., random features) on data holders, and delegate therest of SMPC-friendly computations (i.e., low-rank approximation, model construction, and prediction) to semihonest servers. The resulting algorithm significantly reduces computational and communication costs comparedto Pp-GPp, making it well-suited for application to large-scale datasets. We provide a theoretical analysis interms of the correctness and security of the proposed Ss-based operations. Extensive experiments show thatour methods can achieve competitive performance and efficiency under the premise of preserving privacy.

Read More »

Liu, S., Shi, W., Lv, S., Zhang, Y., Wang, H., & Xu, Z. (2024). Meta-learning via PAC-Bayesian with data-dependent prior: generalization bounds from local entropy. International Joint Conferences on Artificial Intelligence (IJCAI).

Meta-learning accelerates the learning process onunseen learning tasks by acquiring prior knowledgethrough previous related tasks. The PAC-Bayesiantheory provides a theoretical framework to analyzethe generalization of meta-learning to unseen tasks.However, previous works still encounter two notablelimitations:(l)they merely focus on the data-freepriors, which often result in inappropriate regular-ization and loose generalization bounds, (2)moreimportantly, their optimization process usually in.volves nested optimization problems, incurring significant computational costs. To address these issues, we derive new generalization bounds and introduce a novel PAC-Bayesian framework for meta-learning that integrates data-dependent priors. Thisframework enables the extraction of optimal posteriors for each task in closed form, thereby allow-ing us to minimize generalization bounds incorporated data-dependent priors with only a simple localentropy. The resulting algorithm, which employsSGLD for sampling from the optimal posteriors, isstable, efficient, and computationally lightweight,eliminating the need for nested optimization. Extensive experimental results demonstrate that ourproposed method outperforms the other baselines.

Read More »

Li, C., & Hu, Y. (2024). Analysis of the spatiotemporal velocity of annual precipitation based on random field. Communications in Statistics-Theory and Methods, 1-18.

Changes in precipitation directly impact river runoff volume, subsequently influencing food production, and the security of downstream urban areas. In this study, we introduce a random velocity field (RVF) capable of performing multi-step predictions while providing interpretable insights into precipitation variations. The RVF leverages the gradient of a Gaussian random field to learn spatiotemporal velocity patterns and employs a predictive process to reduce dimensionality and enable multi-step forecasting. Bayesian parameter estimation is obtained using the Markov Chain Monte Carlo (MCMC) method. Our analysis reveals a noticeable shifting trend in annual precipitation based on diverse real datasets. This trend serves as a valuable foundation for further exploration of urban flood control and agricultural development strategies.

Read More »