Mathematics Colloquium

Stochastic Gradients with Adaptive Stepsizes

Speaker: Rachel Ward, UT Austin

Location: Warren Weaver Hall 1302

Date: Monday, October 17, 2022, 3:45 p.m.


State-of-art deep learning algorithms are constantly evolving and improving, but the core optimization methods used to train neural networks have remained largely stable. Adaptive gradient variations of stochastic gradient descent such as Adagrad and Adam are popular in practice for demonstrating significantly more stable convergence compared to plain stochastic gradient descent, with less hyperparameter tuning required. We will discuss the current state of theoretical understanding of these algorithms, as well as several open questions inspired by persisting gaps between theory and practice.