-
-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Area under the receiving operating characteristic curve (AUROC) calculation. #977
Comments
I’m not aware of a distributed-friendly way to calculate that metric, but happy to take a PR adding one if that exists. On Sep 5, 2023, at 4:58 PM, Stephen Pardy ***@***.***> wrote:[dask/dask-ml] Area under the receiving operating characteristic curve (AUROC) calculation. (Issue #977)A common metric for classification model performance is the area under the receiving operating characteristic curve (AUROC, AUROCC, or often simply AUC) which shows the area of the curve created by plotting TPR and FPR against one another.Wikipedia.This metric is available in sklearn, but is currently missing from Dask-ml. One issue with computing this metric in Dask is that the standard implementation of AUROC requires searching over a sorted array and the sorting operations are often expensive in distributed computing.I would be interested in a discussion of whether a naive version of this could be implemented using sorting and whether there are any known methods of computation that avoid the sort.—Reply to this email directly, view it on GitHub or unsubscribe.You are receiving this email because you are subscribed to this thread.Triage notifications on the go with GitHub Mobile for iOS or Android.
|
I don't know enough about this specific ML metric, but maybe if we can translate the problem to something more general then I can be of use. For example, If someone had a distribution of values spread in an unsorted way across many machines and they wanted to plot an approximate sorted distribution of those points then there are a couple things they could do:
I may be entirely on the wrong track through. @stephenpardy can you help by reducing your problem a little bit to general numbers / computational terms? I suspect that it'll be easier for folks to help in that case. |
A common metric for classification model performance is the area under the receiving operating characteristic curve (AUROC, AUROCC, or often simply AUC) which shows the area of the curve created by plotting TPR and FPR against one another.
Wikipedia.
This metric is available in sklearn, but is currently missing from Dask-ml. One issue with computing this metric in Dask is that the standard implementation of AUROC requires searching over a sorted array and the sorting operations are often expensive in distributed computing.
I would be interested in a discussion of whether a naive version of this could be implemented using sorting and whether there are any known methods of computation that avoid the sort.
The text was updated successfully, but these errors were encountered: