Geometry Seminar

Coresets for Finding Approximate Maximum in a Range Space

Speaker: Jeff Phillips, University of Utah

Location: Online Zoom-only

Videoconference link: https://youtu.be/49YHMTHbdxA

Date: Tuesday, October 8, 2024, 2 p.m.

Synopsis:

Consider a geometric range space \((X,\mathcal{S})\) for a point set \(X \subset \mathbb{R}^d\) and geometrically defined ranges \(\mathcal{S}\) induced by containment in certain shapes such as disks, rectangles, or halfspaces.  Let \(X\) be the union of two sets colored red \(R\) or blue \(B\).  The discrepancy maximization problem seeks to find shape \(S \in \mathcal{S}\) which maximizes the discrepancy
\(\phi(S) = | (|R \cap S|/|R|) - (|B \cap S|/|B|) |\).  We solve the \(\varepsilon\)-approximate version of this problem, where we seek some \(\hat S\) such that \(\phi(S^*) - \phi(\hat S) \leq \varepsilon\), where \(S^*\) is the range achieving the maximum discrepancy.

Our solution is via a two-level coreset construction, where we approximate the space of ranges \(\mathcal{S}\) with one coreset, and the ground set \(X\) with another.  We generalize this method towards an algorithm for approximately computing a class of distances between probability distributions (in this case point sets like \(R\) and \(B \)) referred to as Integral Probability Measures (IPMs).  This leads to implications in spatial anomaly detection (as spatial scan statistics), machine learning (as linear classifiers), and statistics (in Kolmogorov-Smirnov distances), and more.  In some case the results are optimal given conditional hardness assumptions.  Moreover, the algorithms might even be practical, scalable, and statistically powerful.

Notes:

Contact Boris Aronov to be placed on the mailing list with notifications and Zoom seminar info.