Mathematics Colloquium
Stochastic Gradients with Adaptive Stepsizes
Speaker: Rachel Ward, UT Austin
Location: Warren Weaver Hall 1302
Date: Monday, October 17, 2022, 3:45 p.m.
Synopsis:
State-of-art deep learning algorithms are constantly evolving and improving, but the core optimization methods used to train neural networks have remained largely stable. Adaptive gradient variations of stochastic gradient descent such as Adagrad and Adam are popular in practice for demonstrating significantly more stable convergence compared to plain stochastic gradient descent, with less hyperparameter tuning required. We will discuss the current state of theoretical understanding of these algorithms, as well as several open questions inspired by persisting gaps between theory and practice.