Regularized methods to study multivariate data in high dimensional settings: theory and applications.

Abstract

In this PhD thesis we study general linear model (multivariate linear model) in high dimensional settings. We propose a novel variable selection approach in the framework of multivariate linear models taking into account the dependence that may exist between the responses. It consists in estimating beforehand the covariance matrix of the responses and to plug this estimator in a Lasso criterion, in order to obtain a sparse estimator of the coefficient matrix. The properties of our approach are investigated both from a theoretical and a numerical point of view. More precisely, we give general conditions that the estimators of the covariance matrix and its inverse have to satisfy in order to recover the positions of the zero and non-zero entries of the coefficient matrix when the number of responses is not fixed and can tend to infinity. We also propose novel, efficient and fully data-driven approaches for estimating Toeplitz and large block structured sparse covariance matrices in the case where the number of variables is much larger than the number of samples without limiting ourselves to block diagonal matrices. These approaches are applied to different biological issues in metabolomics, in proteomics and in immunology.

Publication
Université Paris Saclay/ AgroParisTech

Related