Skip to content

Lack of Python package cache invalidation causes issues with Pyodide updates #137

Closed
@rblank

Description

@rblank

Polyscript provides package loading functionality for the Pyodide backend through the packages config key (the link is to the PyScript docs; I couldn't find this documented for Polyscript). My understanding is that the feature works as follows:

  • The first time, Polyscript loads the packages via micropip. It then generates a lockfile via micropip.freeze(), and caches it in IndexedDB with a key corresponding to the string representation of the array of packages loaded. This happens in importPackages().
  • On later page loads, Polyscript looks up the package list in IndexedDB, and if it finds it, it creates a blob for the cached package list and sets the options.lockFileURL to point at that blob. It also sets the package list as options.packages, so that Pyodide loads the packages during startup. This happens in engine().

AFAICT, the cache of lockfiles is never invalidated, except when setting package_cache: 'never' in the config (which clears the cache completely). This causes issues in at least two situations:

  • When updating Pyodide: The lockfile contains the Pyodide version. Pyodide checks it against its own version, and fails to load in case of mismatch.
  • When updating packages: While the cached lockfile ensures that always the same package versions are used, sometimes I do want a newer version of a package to be picked up.

I haven't found a good solution to either of these cases. Getting the current Pyodide version requires loading Pyodide, but this fails due to the version mismatch, so I cannot detect beforehand if I should set package_cache: 'never'. And for the packages, I would have to compare the cached lockfile to the current one; but the cache is an implementation detail, so that's not ideal.

My current workaround, also not ideal, is to not set packages, and instead pass the package list as a different config key my_packages, then retrieve that in Python and call await pyodide_js.loadPackages(polyscript.config.my_packages) from there. This increases the latency until Pyodide is ready, because packages cannot be loaded concurrently with Pyodide initialization.

My use case actually doesn't need the cache at all: I would be fine with Polyscript never loading packages, and instead passing packages directly as options.pacakges to Pyodide, without using micropip and freezing. But there is currently no way to do that. Setting package_cache: 'never' causes the packages to always be loaded with micropip, which has even worse latency than my workaround.

So this issue report is really two things:

  • A bug report for the cache invalidation issues related to Pyodide updates. Both ideas below require getting the Pyodide version before loading it, which I haven't found how to do, and they don't solve the issue with package updates.
    • Have Polyscript check the version in the cached lockfile against the Pyodide version, and invalidate the cache entry if there is a mismatch.
    • Include the Pyodide version in the cache key.
  • A feature request for disabling the package loading in Polyscript and pass the package list directly to Pyodide. Ideas:
    • Accept a pyodide_options config key that is merged with options. This would have the additional benefit of allowing more customization of Pyodide, e.g. setting fullStdLib, stdLibURL, env.
    • Accept package_cache: 'passthrough', and when this is set, set options.packages = packages and don't load packages.

I would be happy to send a PR for at least the feature request. For the bug, I'm still missing the bit about getting the Pyodide version, and how to handle package updates.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions