Skip to content

Use normalized package names where suitable #52

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

DragaDoncila
Copy link
Contributor

Closes #48

This PR adds normalized_name keys to classifiers.json and extended_summary.json, and uses the normalized name when saving manifest json files.

To simplify retrieval of the manifests via the API, we normalize the name in the dynamic routing handler, so that the api endpoint for the manifest still uses the original name.

Copy link

vercel bot commented May 23, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
npe2api ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 30, 2025 5:49am

@DragaDoncila
Copy link
Contributor Author

I've left this PR as draft for now to get some feedback from you @aganders3 as to whether this is what you were imagining.

One thing is I think we'll need to clear the public/manifest/ folder and re-write all the manifests in order for the slug to work and to make sure we've deduplicated everything. We'll also of course need to rewrite the various summary files. Do we do that in this PR? I think so right?

@aganders3
Copy link
Collaborator

At first glance I think yes.

Sorry but it's a holiday weekend here so I probably won't get a chance to look at this until Tuesday!

@psobolewskiPhD
Copy link
Member

y'all think this will help with some of the name issues between - and _:
chanzuckerberg/napari-hub#1336
AdaptiveMotorControlLab/CellSeg3D#110

@DragaDoncila
Copy link
Contributor Author

@psobolewskiPhD yes this should address that issue. The idea is that you'd always be entering the real plugin name in URLs etc. but in the backend we also store the normalized name and use that for lookups and for serving the correct plugin.

@aganders3
Copy link
Collaborator

Hm, on napari-hub that possibly could be resolved server-side by normalizing the name in the URL, but I'm not sure that will work with the static hub-lite. I'll have to look into if there's a simple mechanism for that, otherwise it should be possible to simply duplicate pages with normalized/non-normalized names. I'm not sure right now if that's a good idea or not.

This PR is at least a step towards consistency in that regard, though.

@DragaDoncila
Copy link
Contributor Author

DragaDoncila commented May 28, 2025

@aganders3 I think maybe I'm confused. As I understand, the issue on the hub is that if you have a "non-normalized" name e.g. napari_svetlana, the hub was still creating the link as napari-svetlana, even though the plugin page was served at the "real" plugin name i.e. napari_svetlana.

I think what we want is for both napari-svetlana and napari_svetlana to serve the page, right? But if we normalize the name before lookup, as I do (I think...) in this PR in the slug.js, then both of those should retrieve the correct plugin page.

(Of course we'd also want to make sure we use the normalized name on hub-lite, but that would be a separate PR using the changes here)

@aganders3
Copy link
Collaborator

No, I think you are correct and I am confused. I was confused by this line in the README:

This triggers vercel to build the (fully static) API and deploy to https://npe2api.vercel.app/

My understanding was that with a static export, that code would be run only once on site generation (build). Looking closer now though I think that may not be the case. It looks like this path is what Next.js calls an "API route" and will indeed be handled on the server.

I thought npe2api was served via GitHub pages, but it's actually Vercel, so this makes sense!

I also learned "Route handlers" will let you do this client-side as well with some different configuration. So you can indeed do something similar with a fully-static deployment (like hub-lite).

@DragaDoncila
Copy link
Contributor Author

I also learned "Route handlers" will let you do this client-side as well with some different configuration. So you can indeed do something similar with a fully-static deployment (like hub-lite).

@aganders3 yep we discussed this at the Eurasia community meeting as well and we should be able to organize "redirects" from the non-normalized name to the normalized name and vice-versa.

At community meeting we decided we should always store and key by normalized name, and all files e.g. manifest files are named by the normalized name, and the API should be accessed via its normalized name. I'm going to update this PR to reflect that. I think it should make things a lot tidier.

@DragaDoncila
Copy link
Contributor Author

Ok @aganders3 I think this is kinda ready for a review (leaving the conflicts for now and will resolve just before merge).

The current state is that all files are keyed by normalized_name and store the true name in their values. All individual plugin files including the individual manifests, pypi and conda files are named using the normalized plugin name. This was done by first clearing these folders and then running the indexing workflow from this branch to repopulate each folder. Note that the github folder has not been updated, because fetching github information is essentially broken (has been for a while it turns out). I think we should tackle that in a separate PR?

If you try to access a manifest via the API you can do so using the true plugin name OR the normalized name. Check out the vercel preview above to play with it.

What does this mean for users of the API?

  • I've looked at napari-plugin-manager and it should not affect it. The plugin manager fetches extended_summary and keys into it, and we have not changed the value of any of the keys in the dictionary, just added a key. It does fetch the conda info, which now uses normalized names, but it was normalizing the package name anyway, so this should be a no-op.
  • Based on what I can glean of weather-report, it should also be unaffected. It grabs the index to get the count of plugins, it grabs the conda info just to map to the plugin's conda name, and it grabs extended_summary but then normalizes the name anyway, so this should be a no-op. @Czaki maybe you can confirm.
  • Hub only relies on the plugin index. Now, this index contains normalized names, but afaict this name will only be used to ping pypi for the plugin's info, and pypi takes normalized names with no issue.

@DragaDoncila DragaDoncila marked this pull request as ready for review May 30, 2025 08:02
Copy link
Collaborator

@aganders3 aganders3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! Maybe we should test the plugin manager against this manually by hard-coding the URL to the vercel preview? I can try to find time for that.

Note that the github folder has not been updated, because fetching github information is essentially broken (has been for a while it turns out). I think we should tackle that in a separate PR?

I think that's okay - I'm not even familiar with what this is meant to be doing, so will need to look into it.

@@ -188,7 +201,7 @@ def conda_data_wrapper(args):
{
**pkg,
"pypi_versions": sorted(
active_pypi_versions.get(pkg["name"], []), key=Version, reverse=True
active_pypi_versions.get(pkg["normalized_name"], {}).get("pypi_versions", []), key=Version, reverse=True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the fault of this PR, but it appears this sorting/deduplication is already done by bigquery.py (but not find_by_classifier.py). Maybe that can be unified somehow? My first thought is to remove it here and add it to find_by_classifier.py.


# sort things
PYPI_INDEX = sorted(PYPI_INDEX, key=lambda x: x["name"].lower())
PYPI_INDEX = sorted(PYPI_INDEX, key=lambda x: x["name"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters, but this sorting might be slightly different from before. Previously sorting by x["name"].lower() which is like - partly normalized?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it feels like we are duplicating sorting in too many places. I think some of these scripts could be combined.

Comment on lines +148 to +153
# move the errors file to the top /public folder
desired_errors_path = PUBLIC / "errors.json"
current_error_path = Path(f"{PUBLIC}/manifest/errors.json")
if current_error_path.exists():
current_error_path.rename(desired_errors_path)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on why errors.json would not be in the "desired" location already. Can this be fixed in fetch_manifests.py if it's causing a problem?

I don't think there is (and maybe not possible there will be) a python package called errors but it seems like that would cause an issue here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatedly, I think people have been confused by the errors.json file. It might be worth adding a little more text to index.js explaining what it is. Likewise we should link to the errors from the hub-lite FAQ page if people are looking for "why doesn't my plugin show up?"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with @aganders3. Was there a particular reason that you wanted to move the errors?

name: str
version: str
display_name: str
summary: str
author: str
license: str
home_page: str
project_url: List[str]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this just missing from #50?

I like having this interface, but maybe it makes sense to use a dataclass so items are not missed in the future.

No need to change that here!

@aganders3
Copy link
Collaborator

One note - it seems we lose some plugins here. I think we should investigate if these were duplicates or what.

npe2api on  main via  v20.18.1 via 🐍 v3.12.8 
❯ jq 'length' public/extended_summary.json
530

npe2api on  main via  v20.18.1 via 🐍 v3.12.8 
❯ find public/conda -type f | wc -l
     329

npe2api on  main via  v20.18.1 via 🐍 v3.12.8 
❯ find public/manifest -type f | wc -l 
     530

npe2api on  normalize-names via  v20.18.1 via 🐍 v3.12.8 took 4s 
❯ jq 'length' public/extended_summary.json
515

npe2api on  normalize-names via  v20.18.1 via 🐍 v3.12.8 
❯ find public/conda -type f | wc -l   
     314
     
npe2api on  normalize-names via  v20.18.1 via 🐍 v3.12.8 
❯ find public/manifest -type f | wc -l
     515

@aganders3
Copy link
Collaborator

After accounting for just differences in normalization (likely dupes on main?) I get this. I checked a few of these and they look like projects that have been renamed. Some also seem like they've been deleted from GitHub but not PyPI, maybe. I guess in general that does not cause the old versions to be removed, but I believe this PR starts with a clean slate. I need to think harder about what is desirable here.

>>> normalize - main
{'applet3',
'segmentree'}

>>> main - normalize
{'segtree',
'cryaleconcho',
'mkw-ropcop',
'appletree',
'napari-sim-simulator',
'motile-tracker',
'depalma-napari-omero',
'tracktour',
'ads-napari',
'napari-cryofibsem-imgpro',
'mmv-hitl4trk',
'napari-tomoslice',
'napari-brainways',
'morphospace'}

Copy link
Contributor

@willingc willingc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DragaDoncila While I'm not averse to merging as is, I think that there are a few improvements that can be made now, later, or never 😉

  • Split PR into scripts changes/pages changes from data changes (public). Not strictly necessary but a good practice in the future since it separates logic from applying the logic on the data.
  • use packaging library's normalization utility functions
  • reduce duplication across the scripts
  • consider renaming script files to be more descriptive of their purpose in the workflow

@@ -1,16 +1,33 @@
{
"alveoleye": null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know why these are no longer sorted into the rest of the file? Was that by design?

Comment on lines +148 to +153
# move the errors file to the top /public folder
desired_errors_path = PUBLIC / "errors.json"
current_error_path = Path(f"{PUBLIC}/manifest/errors.json")
if current_error_path.exists():
current_error_path.rename(desired_errors_path)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with @aganders3. Was there a particular reason that you wanted to move the errors?

Comment on lines +159 to +160
mf_file = Path(f"{PUBLIC}/manifest/{normalized_name}.json")
if not mf_file.exists():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mf_file = Path(f"{PUBLIC}/manifest/{normalized_name}.json")
if not mf_file.exists():
manifest_file = Path(f"{PUBLIC}/manifest/{normalized_name}.json")
if not manifest_file.exists():

I would reduce the cognitive load and be explicit in the variable name.


# sort things
PYPI_INDEX = sorted(PYPI_INDEX, key=lambda x: x["name"].lower())
PYPI_INDEX = sorted(PYPI_INDEX, key=lambda x: x["name"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it feels like we are duplicating sorting in too many places. I think some of these scripts could be combined.

@@ -0,0 +1,19 @@
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend removing this file completely and use the packaging libraries convenience utilities for normalization of names. I believe we are already using packaging in the other scripts. https://packaging.pypa.io/en/stable/utils.html

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Figure out name normalization for storing plugin info
4 participants