Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the ROC-AUC score calculation correct? #30

Open
bliu188 opened this issue Mar 12, 2022 · 6 comments
Open

Is the ROC-AUC score calculation correct? #30

bliu188 opened this issue Mar 12, 2022 · 6 comments

Comments

@bliu188
Copy link

bliu188 commented Mar 12, 2022

I am testing the RETAIN model but puzzled by the difference between ROC-AUC and accuracy. The value for ROC-AUC means a nearly non-discriminative model:
Epoch: 9 - ROC-AUC: 0.503192 PR-AUC: 0.147242
but accuracy: 0.8534 is not bad.
It is hard to imagine the model has already been overfitted at Epoch 9. The on_epoch_end callback uses predict_on_batch, which seems fine. I have no clue on what is wrong here.

@jstremme
Copy link
Contributor

@bliu188, can you share all the output from your training run? Also, what's the prevalence of your target variable?

@bliu188
Copy link
Author

bliu188 commented Mar 25, 2022 via email

@tRosenflanz
Copy link
Contributor

tRosenflanz commented Mar 25, 2022

It might be a learning rate issue or something with the data prep since it is a bit more complicated than a simple LSTM.

The ROC calculation should be correct. PR-AUC of an untrained model also equals positive class prevalence so that explains why you are getting around .15

@jstremme
Copy link
Contributor

Thanks, Joel. What kind of output will be helpful for you? I added keras.metrics.AUC to check your roc-auc in the implementation. These two agree, ~0.5 auc, suggesting nothing is wrong with auc calculation. The target is about 20%. I did try to sort the sequence ascending or descending. It appears not to make any difference. But I tested two layers LSTM, which gave 0.8 auc. I also tested Deep Records with 0.75 auc. Just tested GRU with pyTorch_ehr with 0.76 auc. RETAIN model should be competitive based on a recent comparison https://doi.org/10.1016/j.jbi.2019.103337. I have not done any model tuning, but I do not think tuning will make this much difference. The number of epochs did not affect the test auc either. It seems like the training has no improvement at all with more epochs. Thanks, Bing

On Mar 24, 2022, at 2:04 PM, Joel Stremmel @.***> wrote:  @bliu188, can you share all the output from your training run? Also, what's the prevalence of your target variable? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

I was just curious to see the AUC at each epoch. Typically RETAIN achieves the best validation AUC within less than 10 epochs. You could always try dropping the learning rate to see if that yields any benefits. Hope this helps, and thanks for the paper link!

@bliu188
Copy link
Author

bliu188 commented Mar 25, 2022 via email

@bliu188
Copy link
Author

bliu188 commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants