-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Update dataset usage and to version 0.11 for Sentiment Analysis #680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@JRAlexander I tested the PR.working fine. But I have questions on the output. I believe that probability also should be > 50% to get prediction correctly. Correct me if I am wrong. @CESARDELATORRE Adding Cesar for reference. |
It leaves the door open for improving the model. This is the dataset we are using. It has better accuracy than using wikidetox. |
Well, a rule of thumb is to have an accuracy higher than 80%. However, depending on the business/domain it might need to be a lot higher or even be acceptable to be lower. When using small datasets it is normal in most cases that the accuracy won't be very super high, neither the probabilities. Since we're using small datasets for short trainings, I'd suggest not to show the probability in the code or output.. |
I'm going to leave it in there with an explanation about the small data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @JRAlexander. Added a suggestion. I'll leave the decision to implement or not up to you.
Added your suggestion as issue here dotnet/docs#11221 |
Thanks, @luisquintanilla! |
Revising with UCI Sentiment Dataset for greater model accuracy, and updating to 0.11.0.