Probability and Mathematical Physics Seminar

High-dimensional optimization for multi-spiked tensor PCA

Speaker: Cédric Gerbelot, Courant Institute of Mathematical Sciences

Location: Warren Weaver Hall 1302

Date: Friday, October 18, 2024, 11:10 a.m.

Synopsis:

We study the dynamics of two local optimization algorithms, online stochastic gradient descent (SGD) and gradient flow, within the framework of the multi-spiked covariance and tensor model in the high-dimensional regime. This multi-index model arises from the well-known tensor principal component analysis (PCA) problem, where the goal is to infer \(r\) unknown, orthogonal signal vectors within the \(N\)-dimensional unit sphere through maximum likelihood estimation (MLE) from noisy observations of an order-\(p\) tensor. We determine the number of samples and the conditions on the signal-to-noise ratios (SNRs) required to efficiently recover the unknown spikes from natural initializations. Specifically, we distinguish between three types of recovery: exact recovery of each spike, recovery of a permutation of all spikes, and recovery of the correct subspace spanned by the unknown signal vectors. We show that with online SGD, it is possible to recover all spikes provided a number of sample scaling as \(N^{p-2}\), aligning with the computational threshold identified in the rank-one tensor PCA problem~\cite{arous2020algorithmic,arous2021online}. For gradient flow, we show that the algorithmic threshold to efficiently recover the first spike is also of order \(N^{p-2}\). However, recovering the subsequent directions requires the number of samples to scale as \(N^{p-1}\). Our results are obtained through a detailed analysis of a low-dimensional system that describes the evolution of the correlations between the estimators and the unknown vectors, while controlling the noisy part of the dynamics. In particular, the hidden vectors are recovered one by one according to a sequential elimination phenomenon: as one correlation exceeds a critical threshold that depends on the order \(p\) of the tensor and on the SNRs, all correlations that share a row or column index become sufficiently small to be negligible, allowing the subsequent correlation to grow and become macroscopic. The sequence in which correlations become macroscopic depends on their initial values and the associated SNRs.