-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Centroid-based Classifier #5103
Merged
Merged
Commits on Jun 18, 2021
-
Training: * A fixed vocabulary is set to all tokens that appear in, at least, 2 samples. * All out-of-vocabulary tokens are discarded. * For every token, we set its Inverse Class Frequency (ICF) to `log(ct / cf) + 1` where `ct` is the total number of classes and `cf` is the number of classes where the token occurs. * Each sample is converted to a vector of `tf * icf` for every token in the vocabulary. `tf` is `1 + log(freq)`, where `freq` is the number of occurrences of the token in the given sample. * Samples are L2-normalized. * For each class (language), we compute the centroid of all its training samples by averaging them and L2-normalizing the result. Classification: * For a new sample, we get the L2-normalized vector with `tf * icf` terms for every known token, then classify the sample using the nearest centroid. Cosine similarity is used as similarity measure for this.
Configuration menu - View commit details
-
Copy full SHA for 6644c34 - Browse repository at this point
Copy the full SHA 6644c34View commit details -
Configuration menu - View commit details
-
Copy full SHA for 20b33ee - Browse repository at this point
Copy the full SHA 20b33eeView commit details -
Update lib/linguist/samples.rb
Co-authored-by: Colin Seymour <colin@github.com>
Configuration menu - View commit details
-
Copy full SHA for ec2ca35 - Browse repository at this point
Copy the full SHA ec2ca35View commit details -
Update test/test_classifier.rb
Co-authored-by: Colin Seymour <colin@github.com>
Configuration menu - View commit details
-
Copy full SHA for 9b6fa51 - Browse repository at this point
Copy the full SHA 9b6fa51View commit details
Commits on Jul 1, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 6c13235 - Browse repository at this point
Copy the full SHA 6c13235View commit details
Commits on Oct 20, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 1712974 - Browse repository at this point
Copy the full SHA 1712974View commit details
Commits on Nov 14, 2022
-
Configuration menu - View commit details
-
Copy full SHA for a96276c - Browse repository at this point
Copy the full SHA a96276cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 7b80d5e - Browse repository at this point
Copy the full SHA 7b80d5eView commit details
Commits on Mar 6, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 8b709be - Browse repository at this point
Copy the full SHA 8b709beView commit details -
Configuration menu - View commit details
-
Copy full SHA for 91de502 - Browse repository at this point
Copy the full SHA 91de502View commit details -
Configuration menu - View commit details
-
Copy full SHA for afc0417 - Browse repository at this point
Copy the full SHA afc0417View commit details
Commits on Sep 8, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 37db40b - Browse repository at this point
Copy the full SHA 37db40bView commit details
Commits on Jun 8, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 8e475ff - Browse repository at this point
Copy the full SHA 8e475ffView commit details
Commits on Aug 6, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 1d559dd - Browse repository at this point
Copy the full SHA 1d559ddView commit details -
Configuration menu - View commit details
-
Copy full SHA for 170ebda - Browse repository at this point
Copy the full SHA 170ebdaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 43716ce - Browse repository at this point
Copy the full SHA 43716ceView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1d50126 - Browse repository at this point
Copy the full SHA 1d50126View commit details
Commits on Aug 14, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 9d922e3 - Browse repository at this point
Copy the full SHA 9d922e3View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.