Probability and Mathematical Physics Seminar
Saddle-to-Saddle Dynamics in Deep Neural Networks: a Loss Landscape Perspective
Speaker: Arthur Jacot, Courant Institute
Location: Warren Weaver Hall 1302
Date: Friday, February 10, 2023, 11:10 a.m.
Synopsis:
Two distinct regimes appear in large DNNs, depending on the variance of the parameters at initialization. For large initializations, the initial parameters lie inside a narrow valley of global minima and gradient flow converges very fast to a nearby local minimum, never approaching any saddle. The infinite width dynamics in this regime are approximately linear and are described by the Neural Tangent Kernel (NTK). In contrast for small initialization, we observe in linear networks a Saddle-to-Saddle regime, where gradient flow visits the neighborhoods of a sequence of saddles, each corresponding to linear maps of increasing rank. This leads to an implicit bias towards learning low rank linear maps. Similar properties are observed in non-linear networks.