Use normalized package names where suitable #52

DragaDoncila · 2025-05-23T08:53:45Z

Closes #48

This PR adds normalized_name keys to classifiers.json and extended_summary.json, and uses the normalized name when saving manifest json files.

To simplify retrieval of the manifests via the API, we normalize the name in the dynamic routing handler, so that the api endpoint for the manifest still uses the original name.

vercel · 2025-05-23T08:53:49Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
npe2api	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 30, 2025 5:49am

DragaDoncila · 2025-05-23T08:56:27Z

I've left this PR as draft for now to get some feedback from you @aganders3 as to whether this is what you were imagining.

One thing is I think we'll need to clear the public/manifest/ folder and re-write all the manifests in order for the slug to work and to make sure we've deduplicated everything. We'll also of course need to rewrite the various summary files. Do we do that in this PR? I think so right?

aganders3 · 2025-05-24T02:40:20Z

At first glance I think yes.

Sorry but it's a holiday weekend here so I probably won't get a chance to look at this until Tuesday!

psobolewskiPhD · 2025-05-24T19:25:04Z

y'all think this will help with some of the name issues between - and _:
chanzuckerberg/napari-hub#1336
AdaptiveMotorControlLab/CellSeg3D#110

DragaDoncila · 2025-05-28T00:31:44Z

@psobolewskiPhD yes this should address that issue. The idea is that you'd always be entering the real plugin name in URLs etc. but in the backend we also store the normalized name and use that for lookups and for serving the correct plugin.

aganders3 · 2025-05-28T00:46:31Z

Hm, on napari-hub that possibly could be resolved server-side by normalizing the name in the URL, but I'm not sure that will work with the static hub-lite. I'll have to look into if there's a simple mechanism for that, otherwise it should be possible to simply duplicate pages with normalized/non-normalized names. I'm not sure right now if that's a good idea or not.

This PR is at least a step towards consistency in that regard, though.

DragaDoncila · 2025-05-28T00:50:46Z

@aganders3 I think maybe I'm confused. As I understand, the issue on the hub is that if you have a "non-normalized" name e.g. napari_svetlana, the hub was still creating the link as napari-svetlana, even though the plugin page was served at the "real" plugin name i.e. napari_svetlana.

I think what we want is for both napari-svetlana and napari_svetlana to serve the page, right? But if we normalize the name before lookup, as I do (I think...) in this PR in the slug.js, then both of those should retrieve the correct plugin page.

(Of course we'd also want to make sure we use the normalized name on hub-lite, but that would be a separate PR using the changes here)

aganders3 · 2025-05-28T14:31:56Z

No, I think you are correct and I am confused. I was confused by this line in the README:

This triggers vercel to build the (fully static) API and deploy to https://npe2api.vercel.app/

My understanding was that with a static export, that code would be run only once on site generation (build). Looking closer now though I think that may not be the case. It looks like this path is what Next.js calls an "API route" and will indeed be handled on the server.

I thought npe2api was served via GitHub pages, but it's actually Vercel, so this makes sense!

I also learned "Route handlers" will let you do this client-side as well with some different configuration. So you can indeed do something similar with a fully-static deployment (like hub-lite).

DragaDoncila · 2025-05-28T22:00:47Z

I also learned "Route handlers" will let you do this client-side as well with some different configuration. So you can indeed do something similar with a fully-static deployment (like hub-lite).

@aganders3 yep we discussed this at the Eurasia community meeting as well and we should be able to organize "redirects" from the non-normalized name to the normalized name and vice-versa.

At community meeting we decided we should always store and key by normalized name, and all files e.g. manifest files are named by the normalized name, and the API should be accessed via its normalized name. I'm going to update this PR to reflect that. I think it should make things a lot tidier.

This reverts commit bf9020e.

DragaDoncila · 2025-05-30T08:02:07Z

Ok @aganders3 I think this is kinda ready for a review (leaving the conflicts for now and will resolve just before merge).

The current state is that all files are keyed by normalized_name and store the true name in their values. All individual plugin files including the individual manifests, pypi and conda files are named using the normalized plugin name. This was done by first clearing these folders and then running the indexing workflow from this branch to repopulate each folder. Note that the github folder has not been updated, because fetching github information is essentially broken (has been for a while it turns out). I think we should tackle that in a separate PR?

If you try to access a manifest via the API you can do so using the true plugin name OR the normalized name. Check out the vercel preview above to play with it.

What does this mean for users of the API?

I've looked at napari-plugin-manager and it should not affect it. The plugin manager fetches extended_summary and keys into it, and we have not changed the value of any of the keys in the dictionary, just added a key. It does fetch the conda info, which now uses normalized names, but it was normalizing the package name anyway, so this should be a no-op.
Based on what I can glean of weather-report, it should also be unaffected. It grabs the index to get the count of plugins, it grabs the conda info just to map to the plugin's conda name, and it grabs extended_summary but then normalizes the name anyway, so this should be a no-op. @Czaki maybe you can confirm.
Hub only relies on the plugin index. Now, this index contains normalized names, but afaict this name will only be used to ping pypi for the plugin's info, and pypi takes normalized names with no issue.

aganders3

This looks good to me! Maybe we should test the plugin manager against this manually by hard-coding the URL to the vercel preview? I can try to find time for that.

Note that the github folder has not been updated, because fetching github information is essentially broken (has been for a while it turns out). I think we should tackle that in a separate PR?

I think that's okay - I'm not even familiar with what this is meant to be doing, so will need to look into it.

aganders3 · 2025-05-30T14:15:16Z

scripts/reindex.py

@@ -188,7 +201,7 @@ def conda_data_wrapper(args):
        {
            **pkg,
            "pypi_versions": sorted(
-                active_pypi_versions.get(pkg["name"], []), key=Version, reverse=True
+                active_pypi_versions.get(pkg["normalized_name"], {}).get("pypi_versions", []), key=Version, reverse=True


Not the fault of this PR, but it appears this sorting/deduplication is already done by bigquery.py (but not find_by_classifier.py). Maybe that can be unified somehow? My first thought is to remove it here and add it to find_by_classifier.py.

aganders3 · 2025-05-30T14:24:56Z

scripts/reindex.py


    # sort things
-    PYPI_INDEX = sorted(PYPI_INDEX, key=lambda x: x["name"].lower())
+    PYPI_INDEX = sorted(PYPI_INDEX, key=lambda x: x["name"])


I don't think it matters, but this sorting might be slightly different from before. Previously sorting by x["name"].lower() which is like - partly normalized?

Yeah, it feels like we are duplicating sorting in too many places. I think some of these scripts could be combined.

aganders3 · 2025-05-30T14:38:13Z

scripts/reindex.py

+    # move the errors file to the top /public folder
+    desired_errors_path = PUBLIC / "errors.json"
+    current_error_path = Path(f"{PUBLIC}/manifest/errors.json")
+    if current_error_path.exists():
+        current_error_path.rename(desired_errors_path)
+


I'm not clear on why errors.json would not be in the "desired" location already. Can this be fixed in fetch_manifests.py if it's causing a problem?

I don't think there is (and maybe not possible there will be) a python package called errors but it seems like that would cause an issue here.

Relatedly, I think people have been confused by the errors.json file. It might be worth adding a little more text to index.js explaining what it is. Likewise we should link to the errors from the hub-lite FAQ page if people are looking for "why doesn't my plugin show up?"

I tend to agree with @aganders3. Was there a particular reason that you wanted to move the errors?

aganders3 · 2025-05-30T14:43:35Z

scripts/reindex.py

    name: str
    version: str
    display_name: str
    summary: str
    author: str
    license: str
    home_page: str
+    project_url:  List[str]


Was this just missing from #50?

I like having this interface, but maybe it makes sense to use a dataclass so items are not missed in the future.

No need to change that here!

aganders3 · 2025-05-30T15:13:54Z

One note - it seems we lose some plugins here. I think we should investigate if these were duplicates or what.

npe2api on  main via  v20.18.1 via 🐍 v3.12.8 
❯ jq 'length' public/extended_summary.json
530

npe2api on  main via  v20.18.1 via 🐍 v3.12.8 
❯ find public/conda -type f | wc -l
     329

npe2api on  main via  v20.18.1 via 🐍 v3.12.8 
❯ find public/manifest -type f | wc -l 
     530

npe2api on  normalize-names via  v20.18.1 via 🐍 v3.12.8 took 4s 
❯ jq 'length' public/extended_summary.json
515

npe2api on  normalize-names via  v20.18.1 via 🐍 v3.12.8 
❯ find public/conda -type f | wc -l   
     314
     
npe2api on  normalize-names via  v20.18.1 via 🐍 v3.12.8 
❯ find public/manifest -type f | wc -l
     515

aganders3 · 2025-05-30T16:28:30Z

After accounting for just differences in normalization (likely dupes on main?) I get this. I checked a few of these and they look like projects that have been renamed. Some also seem like they've been deleted from GitHub but not PyPI, maybe. I guess in general that does not cause the old versions to be removed, but I believe this PR starts with a clean slate. I need to think harder about what is desirable here.

>>> normalize - main
{'applet3',
'segmentree'}

>>> main - normalize
{'segtree',
'cryaleconcho',
'mkw-ropcop',
'appletree',
'napari-sim-simulator',
'motile-tracker',
'depalma-napari-omero',
'tracktour',
'ads-napari',
'napari-cryofibsem-imgpro',
'mmv-hitl4trk',
'napari-tomoslice',
'napari-brainways',
'morphospace'}

willingc

@DragaDoncila While I'm not averse to merging as is, I think that there are a few improvements that can be made now, later, or never 😉

Split PR into scripts changes/pages changes from data changes (public). Not strictly necessary but a good practice in the future since it separates logic from applying the logic on the data.
use packaging library's normalization utility functions
reduce duplication across the scripts
consider renaming script files to be more descriptive of their purpose in the workflow

willingc · 2025-05-30T20:31:15Z

public/conda.json

@@ -1,16 +1,33 @@
 {
+  "alveoleye": null,


Do we know why these are no longer sorted into the rest of the file? Was that by design?

willingc · 2025-05-30T20:42:15Z

scripts/reindex.py

+    # move the errors file to the top /public folder
+    desired_errors_path = PUBLIC / "errors.json"
+    current_error_path = Path(f"{PUBLIC}/manifest/errors.json")
+    if current_error_path.exists():
+        current_error_path.rename(desired_errors_path)
+


I tend to agree with @aganders3. Was there a particular reason that you wanted to move the errors?

willingc · 2025-05-30T20:46:04Z

scripts/reindex.py

+        mf_file = Path(f"{PUBLIC}/manifest/{normalized_name}.json")
+        if not mf_file.exists():


Suggested change

mf_file = Path(f"{PUBLIC}/manifest/{normalized_name}.json")

if not mf_file.exists():

manifest_file = Path(f"{PUBLIC}/manifest/{normalized_name}.json")

if not manifest_file.exists():

I would reduce the cognitive load and be explicit in the variable name.

willingc · 2025-05-30T20:48:22Z

scripts/reindex.py


    # sort things
-    PYPI_INDEX = sorted(PYPI_INDEX, key=lambda x: x["name"].lower())
+    PYPI_INDEX = sorted(PYPI_INDEX, key=lambda x: x["name"])


Yeah, it feels like we are duplicating sorting in too many places. I think some of these scripts could be combined.

willingc · 2025-05-30T20:53:03Z

scripts/utils.py

@@ -0,0 +1,19 @@
+import re


I would recommend removing this file completely and use the packaging libraries convenience utilities for normalization of names. I believe we are already using packaging in the other scripts. https://packaging.pypa.io/en/stable/utils.html

Draga Doncila added 7 commits April 25, 2025 14:13

Add normalizing func

3de1ae3

normalize name in classifiers.json

a34567a

Add normalized name in extended_summary

617d324

Rename manifest files using normalized name and reroute api endpoint

bfb162e

Revert changes to public

206fe14

Merge branch 'main' into normalize-names

6238b63

Small tweaks and tidying

d064c36

Use normalized name as keys

e8dae83

vercel bot deployed to Preview May 30, 2025 03:14 View deployment

[bot] Update Plugin Index

bf9020e

vercel bot deployed to Preview May 30, 2025 03:30 View deployment

Revert "[bot] Update Plugin Index"

1875286

This reverts commit bf9020e.

vercel bot deployed to Preview May 30, 2025 03:33 View deployment

Draga Doncila added 2 commits May 30, 2025 13:55

Normalize names in pypi folder, index and conda folder

5dbf0b4

Empty public folders except github folder to populate afresh

e97be89

vercel bot deployed to Preview May 30, 2025 03:57 View deployment

Fix import

01f6073

vercel bot deployed to Preview May 30, 2025 05:05 View deployment

Fix it again smh

8fe3c8b

vercel bot deployed to Preview May 30, 2025 05:10 View deployment

Add some prints to figure out where error comes from

96ce8a8

vercel bot deployed to Preview May 30, 2025 05:28 View deployment

Check directory listing?

49ed995

vercel bot deployed to Preview May 30, 2025 05:38 View deployment

make dir if it doesn't exist...

6222de1

vercel bot deployed to Preview May 30, 2025 05:43 View deployment

[bot] Update Plugin Index

e2586ce

vercel bot deployed to Preview May 30, 2025 05:49 View deployment

DragaDoncila marked this pull request as ready for review May 30, 2025 08:02

aganders3 approved these changes May 30, 2025

View reviewed changes

willingc approved these changes May 30, 2025

View reviewed changes

		mf_file = Path(f"{PUBLIC}/manifest/{normalized_name}.json")
		if not mf_file.exists():

Use normalized package names where suitable #52

Are you sure you want to change the base?

Use normalized package names where suitable #52

Conversation

DragaDoncila commented May 23, 2025

Uh oh!

vercel bot commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DragaDoncila commented May 23, 2025

Uh oh!

aganders3 commented May 24, 2025

Uh oh!

psobolewskiPhD commented May 24, 2025

Uh oh!

DragaDoncila commented May 28, 2025

Uh oh!

aganders3 commented May 28, 2025

Uh oh!

DragaDoncila commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aganders3 commented May 28, 2025

Uh oh!

DragaDoncila commented May 28, 2025

Uh oh!

DragaDoncila commented May 30, 2025

Uh oh!

aganders3 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aganders3 commented May 30, 2025

Uh oh!

aganders3 commented May 30, 2025

Uh oh!

willingc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vercel bot commented May 23, 2025 •

edited

Loading

DragaDoncila commented May 28, 2025 •

edited

Loading