research findings
Filters
搜索

Yu G., Li, Q., Wang, J., Zhang, D., & Liu, Y. (2020). A multimodal generative and fusion framework for recognizing faculty homepages. Information Sciences, 525, 205-220.

Multimodal data consist of several data modes, where each mode is a group of similar data sharing the same attributes. Recognizing faculty homepages is essentially a multimodal classification problem in which a target faculty homepage is determined from three different information sources, including text, images, and layout. Conventional strategies in previous studies have been either to concatenate features from various information sources into a compound vector or to input them separately into several different classifiers that are then assembled into a stronger classifier for the final prediction. However, both approaches ignore the connections among different feature sets. We argue that such relations are essential to enhance multimodal classification. Besides, recognizing faculty homepages is a class imbalance problem in which the total number of samples of a minority class is far smaller than the sample numbers of other classes. In this study, we propose a multimodal generative and fusion framework for multimodal learning with the problems of imbalanced data and mutually dependent feature modes. Specifically, a multimodal generative adversarial network is first introduced to rebalance the dataset by generating pseudo features based on each mode and combining them to describe a fake sample. Then, a gated fusion network with the gate and fusion mechanisms is presented to reduce the noise to improve the generalization ability and capture the links among the different feature modes. Experiments on a faculty homepage dataset show the superiority of the proposed framework.

Read More »

Zhang, J., & Chen, X. (2020). Principal envelope model. Journal of Statistical Planning and Inference, 206, 249-262.

Principal component analysis (PCA) is widely used in various fields to reduce high dimensional data sets to lower dimensions. Traditionally, the first a few principal components that capture most of the variance in the data are thought to be important. Tipping and Bishop (1999) introduced probabilistic principal component analysis (PPCA) in which they assumed an isotropic error in a latent variable model. Motivated by a general error structure and incorporating the novel idea of ‘‘envelope” proposed by Cook et al. (2010), we construct principal envelope models (PEM) which demonstrate the possibility that any subset of the principal components could retain most of the sample’s information. The useful principal components can be found through maximum likelihood approaches. We also embed the PEM to a factor model setting to illustrate its reasonableness and validity. Numerical results indicate the potentials of the proposed method.

Read More »

Li, C., Song, Z., & Wang, W. (2020). Space-time inhomogeneous background intensity estimators for semi-parametric space-time self-exciting point process models. Annals of the Institute of Statistical Mathematics, 72, 945-967.

Histogram maximum likelihood estimators of semi-parametric space–time selfexciting point process models via expectation–maximization algorithm can be biased when the background process is inhomogeneous. We explore an alternative estimation method based on the variable bandwidth kernel density estimation (KDE) and EM algorithm. The proposed estimation method involves expanding the semi-parametric models by incorporating an inhomogeneous background process in space and time and applying the variable bandwidth KDE to estimate the background intensity function. Using an example, we show how the variable bandwidth KDE can be estimated this way. Two simulation examples based on residual analysis are designed to evaluate and validate the ability of our methods to recover the background intensity function and parametric triggering intensity function.

Read More »

Zhang, J., Shi, H., Tian, L., & Xiao, F. (2019). Penalized generalized empirical likelihood in high-dimensional weakly dependent data. Journal of Multivariate Analysis, 171, 270-283.

In this paper, we propose a penalized generalized empirical likelihood (PGEL) approach based on the smoothed moment functions Anatolyev (2005), Smith (1997), Smith (2004) for parameters estimation and variable selection in the growing (high) dimensional weakly dependent time series setting. The dimensions of the parameters and moment restrictions are both allowed to grow with the sample size at some moderate rates. The asymptotic properties of the estimators of the smoothed generalized empirical likelihood (SGEL) and its penalized version (SPGEL) are then obtained by properly restricting the degree of data dependence. It is shown that the SPGEL estimator maintains the oracle property despite the existence of data dependence and growing (high) dimensionality. We finally present simulation results and a real data analysis to illustrate the finite-sample performance and applicability of our proposed method.

Read More »

Zhang, J., & Chen, X. (2019). Robust sufficient dimension reduction via ball covariance. Computational Statistics & Data Analysis, 140, 144-154.

Sufficient dimension reduction is an important branch of dimension reduction, which includes variable selection and projection methods. Most of the sufficient dimension reduction methods are sensitive to outliers and heavy-tailed predictors, and require strict restrictions on the predictors and the response. In order to widen the applicability of sufficient dimension reduction, we propose BCov-SDR, a novel sufficient dimension reduction approach that is based on a recently developed dependence measure: ball covariance. Compared with other popular sufficient dimension reduction methods, our approach requires rather mild conditions on the predictors and the response, and is robust to outliers or heavy-tailed distributions. BCov-SDR does not require the specification of a forward regression model and allows for discrete or categorical predictors and multivariate response. The consistency of the BCov-SDR estimator of the central subspace is obtained without imposing any moment conditions on the predictors. Simulations and real data studies illustrate the applicability and versatility of our proposed method.

Read More »

Li, C., Song, Z., Wang, W., & Wang, X. S. (2019). Mining periodic patterns and cascading bursts phenomenon in individual e-mail communication. Journal of Applied Statistics, 46, 2603-2626.

Quantitative understanding of human activity is very important as many social and economic trends are driven by human actions. We propose a novel stochastic process, the Multi-state Markov Cascading Non-homogeneous Poisson Process (M2CNPP), to analyze human e-mail communication involving both periodic patterns and bursts phenomenon. The model parameters are estimated using the Generalized Expectation Maximization (GEM) algorithm while the hidden states are treated as missing values. The empirical results demonstrate that the proposed model adequately captures the major temporal cascading features as well as the periodic patterns in e-mail communication.

Read More »

Lu, W., Chen, C., & Wang, J. (2019). A Study on the irregularly spaced intraday value at risk for the portfolio selection. Journal of Applied Statistics and Management, 38, 1104-1118.

This research first proposed the method to estimate the Irregularly $paced Intraday Value atRisk (ISIVaR) and solve the problem result from the irregularly spaced and asynchronous multivariatetick-by-tick data. Firstly, this research makes use of the auto correlation duration model to fit theprice durations of each single asset in the portfolio. Then, based on the duration, the assets’ intradayvolatility and ISIVaR, is estimated. Next, by Fresh Time method, this research synchronizes the priceevents sequences of the portfolio. Then, the Copula theory is used to model the irregularly spacedintraday volatility in order to capture the cross-sectional correlation information between the assets inthe portfolio. Fimally, based on the cross-section correlation, the ISIVaR of the portfolio is estimated bvMonte Carlo simulation method, At the end of this research, an empirical study is presented to validatethe feasibility of the proposed method.

Read More »

Chang, J., Tang, C. Y., & Wu, T. T. (2018). A new scope of penalized empirical likelihood with high-dimensional estimating equations. Annals of Statistics, 46, 3185-3216.

Statistical methods with empirical likelihood (EL) are appealing and ef- fective especially in conjunction with estimating equations for flexibly and adaptively incorporating data information. It is known that EL approaches en- counter difficulties when dealing with high-dimensional problems. To over- come the challenges, we begin our study with investigating high-dimensional EL from a new scope targeting at high-dimensional sparse model parame- ters. We show that the new scope provides an opportunity for relaxing the stringent requirement on the dimensionality of the model parameters. Moti- vated by the new scope, we then propose a new penalized EL by applying two penalty functions respectively regularizing the model parameters and the associated Lagrange multiplier in the optimizations of EL. By penalizing the Lagrange multiplier to encourage its sparsity, a drastic dimension reduction in the number of estimating equations can be achieved. Most attractively, such a reduction in dimensionality of estimating equations can be viewed as a selection among those high-dimensional estimating equations, resulting in a highly parsimonious and effective device for estimating high-dimensional sparse model parameters. Allowing both the dimensionalities of model pa- rameters and estimating equations growing exponentially with the sample size, our theory demonstrates that our new penalized EL estimator is sparse and consistent with asymptotically normally distributed nonzero components. Numerical simulations and a real data analysis show that the proposed penal- ized EL works promisingly.

Read More »

Chang, J., Guo, B., & Yao, Q. (2018). Principal component analysis for second-order stationary vector time series. Annals of Statistics, 46, 2094-2124.

We extend the principal component analysis (PCA) to second-order sta- tionary vector time series in the sense that we seek for a contemporaneous linear transformation for a p-variate time series such that the transformed series is segmented into several lower-dimensional subseries, and those sub- series are uncorrelated with each other both contemporaneously and serially. Therefore, those lower-dimensional series can be analyzed separately as far as the linear dynamic structure is concerned. Technically, it boils down to an eigenanalysis for a positive definite matrix. When p is large, an additional step is required to perform a permutation in terms of either maximum cross- correlations or FDR based on multiple tests. The asymptotic theory is es- tablished for both fixed p and diverging p when the sample size n tends to infinity. Numerical experiments with both simulated and real data sets indi- cate that the proposed method is an effective initial step in analyzing multiple time series data, which leads to substantial dimension reduction in modelling and forecasting high-dimensional linear dynamical structures. Unlike PCA for independent data, there is no guarantee that the required linear transforma- tion exists. When it does not, the proposed method provides an approximate segmentation which leads to the advantages in, for example, forecasting for future values. The method can also be adapted to segment multiple volatility processes.

Read More »