Applied Math Seminar

Are Gaussian data all you need for machine learning theory?

Speaker: Florent Krzakala, EPFL

Location: Warren Weaver Hall 1302

Date: Friday, April 14, 2023, 2:30 p.m.

Synopsis:

Clearly, the answer is no! Nevertheless, the Gaussian assumption remains prevalent among theoreticians, particularly in high-dimensional statistics and physics, less so in traditional statistical learning circles. To what extent are Gaussian features merely a convenient choice for certain theoreticians, or genuinely an effective model for learning? In this talk, I will review recent progress on these questions, achieved using rigorous probabilistic approaches in high-dimension and techniques from mathematical statistical physics. I will demonstrate that, despite its apparent limitations, the Gaussian approach is sometimes much closer to reality than one might expect. In particular, I will discuss key findings from a series of recent papers that showcase the Gaussian equivalence of generative models, the universality of Gaussian mixtures, and the conditions under which a single Gaussian can characterize the error in high-dimensional estimation. These results illuminate the strengths and weaknesses of the Gaussian assumption, shedding light on its applicability and limitations in the realm of theoretical machine learning.