Optimizing ROC Curves with a Sort-Based Surrogate Loss for Binary Classification and Changepoint Detection
When
Abstract: Receiver Operating Characteristic (ROC) curves are useful for evaluating binary classification models, but difficult to use for learning since the Area Under the Curve (AUC) is a piecewise constant function of predicted values. ROC curves can also be used in other problems with false positive and true positive rates such as changepoint detection. We show that in this more general context, the ROC curve can have loops, points with highly sub-optimal error rates, and AUC greater than one. This observation motivates a new optimization objective: rather than maximizing the AUC, we would like a monotonic ROC curve with AUC=1 that avoids points with large values for Min(FP,FN). We propose an L1 relaxation of this objective that results in a new surrogate loss function called the AUM, short for Area Under Min(FP, FN). Whereas previous loss functions are based on summing over all labeled examples or pairs, the AUM requires a sort and a sum over the sequence of points on the ROC curve. We show that AUM directional derivatives can be efficiently computed and used in a gradient descent learning algorithm. In our empirical study of supervised binary classification and changepoint detection problems, we show that our new AUM minimization learning algorithm results in improved AUC and speed relative to previous baselines.
Location: PAS 522