-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search rankings file is calculated by partial information #3778
Comments
After investigating this a bit, it seems that we don't have the information on newer CDN log entries to filter this properly any more, and ignoring the filter results in some strange results. We have 2 options here:
|
Lets try another approach: remove usage of rankings file when calculating search results (not the default list) |
@ryuyu wrote: Methodology Query List Used: (this list was chosen arbitrarily by packages on the front page and ASP NET request). Test Matrix: Analysis Conclusion
|
We got an additional customer complaint regarding this: When you search for "protobuf", Google.Protobuf is nearly at the bottom with 500k+ downloads. Many many packages with less than 100 downloads are ranked higher. P.S. I was surprised there are no controls to reorder the search results (Most downloads, Last updated, etc.) |
NUnit is now higher rank that Newtonsoft.Json, despite having much lower downloads. This is because the 2.x usage of NUnit has surpassed Newtonsoft.Json. The current download and weights values for the top 10 packages in the rankings file are:
You'll see that updates are worth half the weight of installs. The weight of NUnit is This is why NUnit is higher. |
Finding packages in visual studio package manager is difficult without the proper relevance sorting. Developers may accidentally us a similar (and potentially malicious package), if they are not very careful. |
The rankings file is no longer used by the primary search experience. Azure Search uses the total download count primarily and does not consider this install/update metric sent from V2 clients. Issue #7186 tracks the fundamental problem of download count w.r.t. to direct vs. transitive dependencies. Issue https://github.com/nuget/engineering/issues/1321 tracks the clean-up of these old reports. |
Rankings file calculation takes into account only downloads with "update" or "install" operations, however this information is not provided my newer clients, and they don't send it as part of the header:
https://github.com/NuGet/NuGet.Jobs/blob/master/src/Search.GenerateAuxiliaryData/SqlScripts/Rankings.sql#L25
https://www.nuget.org/stats/packages/newtonsoft.json?groupby=Version&groupby=Operation
The text was updated successfully, but these errors were encountered: