Student Probability Seminar

Using measure theory to reassure AI engineers about uncertainty quantification

Speaker: Colin McSwiggen, NYU Courant

Location: Warren Weaver Hall 202

Date: Wednesday, April 10, 2024, 12:30 p.m.

Synopsis:

A statistical model is said to be calibrated if it has the appropriate level of confidence in its own predictions: that is, the confidence that it assigns to a predicted outcome should accurately reflect that outcome's likelihood.  For example, if a weather model is calibrated, then out of all of the days when the model predicts a 30% chance of rain, we should expect that it actually will rain on 30% of them.  Calibration is crucial for managing the risks associated with incorrect predictions, but modern deep learning models are systematically miscalibrated: they are overconfident when they are incorrect.  To make matters worse, theorists can't agree about how miscalibration should be quantified!  The prevailing miscalibration metric in engineering applications is the expected calibration error (ECE), which has been widely criticized because it is discontinuous: a tiny change in the model can lead to a large change in the error.  In this talk, I'll try to convince you that this problem isn't really a problem, that ECE was fine all along, and that engineers should feel free to keep using it the way they always have (at least for binary classification tasks).  The argument will require us to answer a strange but fundamental question about the topological properties of the conditional expectation operator.