Optimization Properties of Neural Networks: Landscape Analysis and Mean-Field Dynamics
Speaker: Song Mei, Stanford University
Location: 60 Fifth Avenue 150
Date: Friday, February 7, 2020, 3:30 p.m.
Neural networks are among the most powerful models in machine learning, yet the fundamental reasons for this success defy traditional mathematical understanding. Learning a neural network requires optimizing a non-convex and high-dimensional risk function, which is a problem that is usually attacked by stochastic gradient descent (SGD) algorithm. For many practical tasks including image classification, SGD converges to an approximate global optimum of the risk function. Does this happen because local minima are absent, or because SGD somehow avoids them?
In this talk, I will discuss landscape analysis and dynamics approach for neural networks. For one-node neural networks, using the uniform convergence argument, we prove that the landscape of empirical risk resembles that of population risk. As a result, there is no spurious local minima of the empirical risk despite its non-convexity. For two-layers neural networks, we prove that, in a suitable scaling limit, SGD dynamics can be captured by a certain non-linear partial differential equation. The reduced dynamics can then be used to prove global convergence of noisy SGD to nearly optimal risk.