Probability and Mathematical Physics Seminar

Saddle-to-Saddle Dynamics in Deep Neural Networks: a Loss Landscape Perspective

Speaker: Arthur Jacot, Courant Institute

Location: Warren Weaver Hall 1302

Date: Friday, February 10, 2023, 11:10 a.m.


Two distinct regimes appear in large DNNs, depending on the variance of the parameters at initialization. For large initializations, the initial parameters lie inside a narrow valley of global minima and gradient flow converges very fast to a nearby local minimum, never approaching any saddle. The infinite width dynamics in this regime are approximately linear and are described by the Neural Tangent Kernel (NTK). In contrast for small initialization, we observe in linear networks a Saddle-to-Saddle regime, where gradient flow visits the neighborhoods of a sequence of saddles, each corresponding to linear maps of increasing rank. This leads to an implicit bias towards learning low rank linear maps. Similar properties are observed in non-linear networks.