-
Notifications
You must be signed in to change notification settings - Fork 8
Use normalized package names where suitable #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
I've left this PR as draft for now to get some feedback from you @aganders3 as to whether this is what you were imagining. One thing is I think we'll need to clear the |
At first glance I think yes. Sorry but it's a holiday weekend here so I probably won't get a chance to look at this until Tuesday! |
y'all think this will help with some of the name issues between |
@psobolewskiPhD yes this should address that issue. The idea is that you'd always be entering the real plugin name in URLs etc. but in the backend we also store the normalized name and use that for lookups and for serving the correct plugin. |
Hm, on napari-hub that possibly could be resolved server-side by normalizing the name in the URL, but I'm not sure that will work with the static hub-lite. I'll have to look into if there's a simple mechanism for that, otherwise it should be possible to simply duplicate pages with normalized/non-normalized names. I'm not sure right now if that's a good idea or not. This PR is at least a step towards consistency in that regard, though. |
@aganders3 I think maybe I'm confused. As I understand, the issue on the hub is that if you have a "non-normalized" name e.g. I think what we want is for both (Of course we'd also want to make sure we use the normalized name on hub-lite, but that would be a separate PR using the changes here) |
No, I think you are correct and I am confused. I was confused by this line in the README:
My understanding was that with a static export, that code would be run only once on site generation (build). Looking closer now though I think that may not be the case. It looks like this path is what Next.js calls an "API route" and will indeed be handled on the server. I thought npe2api was served via GitHub pages, but it's actually Vercel, so this makes sense! I also learned "Route handlers" will let you do this client-side as well with some different configuration. So you can indeed do something similar with a fully-static deployment (like hub-lite). |
@aganders3 yep we discussed this at the Eurasia community meeting as well and we should be able to organize "redirects" from the non-normalized name to the normalized name and vice-versa. At community meeting we decided we should always store and key by normalized name, and all files e.g. manifest files are named by the normalized name, and the API should be accessed via its normalized name. I'm going to update this PR to reflect that. I think it should make things a lot tidier. |
This reverts commit bf9020e.
Ok @aganders3 I think this is kinda ready for a review (leaving the conflicts for now and will resolve just before merge). The current state is that all files are keyed by If you try to access a manifest via the API you can do so using the true plugin name OR the normalized name. Check out the vercel preview above to play with it. What does this mean for users of the API?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! Maybe we should test the plugin manager against this manually by hard-coding the URL to the vercel preview? I can try to find time for that.
Note that the github folder has not been updated, because fetching github information is essentially broken (has been for a while it turns out). I think we should tackle that in a separate PR?
I think that's okay - I'm not even familiar with what this is meant to be doing, so will need to look into it.
@@ -188,7 +201,7 @@ def conda_data_wrapper(args): | |||
{ | |||
**pkg, | |||
"pypi_versions": sorted( | |||
active_pypi_versions.get(pkg["name"], []), key=Version, reverse=True | |||
active_pypi_versions.get(pkg["normalized_name"], {}).get("pypi_versions", []), key=Version, reverse=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the fault of this PR, but it appears this sorting/deduplication is already done by bigquery.py
(but not find_by_classifier.py
). Maybe that can be unified somehow? My first thought is to remove it here and add it to find_by_classifier.py
.
|
||
# sort things | ||
PYPI_INDEX = sorted(PYPI_INDEX, key=lambda x: x["name"].lower()) | ||
PYPI_INDEX = sorted(PYPI_INDEX, key=lambda x: x["name"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it matters, but this sorting might be slightly different from before. Previously sorting by x["name"].lower()
which is like - partly normalized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it feels like we are duplicating sorting in too many places. I think some of these scripts could be combined.
# move the errors file to the top /public folder | ||
desired_errors_path = PUBLIC / "errors.json" | ||
current_error_path = Path(f"{PUBLIC}/manifest/errors.json") | ||
if current_error_path.exists(): | ||
current_error_path.rename(desired_errors_path) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on why errors.json
would not be in the "desired" location already. Can this be fixed in fetch_manifests.py
if it's causing a problem?
I don't think there is (and maybe not possible there will be) a python package called errors
but it seems like that would cause an issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relatedly, I think people have been confused by the errors.json
file. It might be worth adding a little more text to index.js
explaining what it is. Likewise we should link to the errors from the hub-lite FAQ page if people are looking for "why doesn't my plugin show up?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with @aganders3. Was there a particular reason that you wanted to move the errors?
name: str | ||
version: str | ||
display_name: str | ||
summary: str | ||
author: str | ||
license: str | ||
home_page: str | ||
project_url: List[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this just missing from #50?
I like having this interface, but maybe it makes sense to use a dataclass so items are not missed in the future.
No need to change that here!
One note - it seems we lose some plugins here. I think we should investigate if these were duplicates or what.
|
After accounting for just differences in normalization (likely dupes on main?) I get this. I checked a few of these and they look like projects that have been renamed. Some also seem like they've been deleted from GitHub but not PyPI, maybe. I guess in general that does not cause the old versions to be removed, but I believe this PR starts with a clean slate. I need to think harder about what is desirable here.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DragaDoncila While I'm not averse to merging as is, I think that there are a few improvements that can be made now, later, or never 😉
- Split PR into scripts changes/pages changes from data changes (
public
). Not strictly necessary but a good practice in the future since it separates logic from applying the logic on the data. - use packaging library's normalization utility functions
- reduce duplication across the scripts
- consider renaming script files to be more descriptive of their purpose in the workflow
@@ -1,16 +1,33 @@ | |||
{ | |||
"alveoleye": null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know why these are no longer sorted into the rest of the file? Was that by design?
# move the errors file to the top /public folder | ||
desired_errors_path = PUBLIC / "errors.json" | ||
current_error_path = Path(f"{PUBLIC}/manifest/errors.json") | ||
if current_error_path.exists(): | ||
current_error_path.rename(desired_errors_path) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with @aganders3. Was there a particular reason that you wanted to move the errors?
mf_file = Path(f"{PUBLIC}/manifest/{normalized_name}.json") | ||
if not mf_file.exists(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mf_file = Path(f"{PUBLIC}/manifest/{normalized_name}.json") | |
if not mf_file.exists(): | |
manifest_file = Path(f"{PUBLIC}/manifest/{normalized_name}.json") | |
if not manifest_file.exists(): |
I would reduce the cognitive load and be explicit in the variable name.
|
||
# sort things | ||
PYPI_INDEX = sorted(PYPI_INDEX, key=lambda x: x["name"].lower()) | ||
PYPI_INDEX = sorted(PYPI_INDEX, key=lambda x: x["name"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it feels like we are duplicating sorting in too many places. I think some of these scripts could be combined.
@@ -0,0 +1,19 @@ | |||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend removing this file completely and use the packaging
libraries convenience utilities for normalization of names. I believe we are already using packaging in the other scripts. https://packaging.pypa.io/en/stable/utils.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL!
Closes #48
This PR adds
normalized_name
keys toclassifiers.json
andextended_summary.json
, and uses the normalized name when saving manifest json files.To simplify retrieval of the manifests via the API, we normalize the name in the dynamic routing handler, so that the api endpoint for the manifest still uses the original name.