Description
When we publish a package today, we specify whether it is a Legacy
or Current
package. This source
is quite old, from our first cut at the importer scripts, and its scope has diminished significantly over time. Nowadays it solely controls whether to fail the publishing pipeline if a package fails to compile. Legacy packages can fail compilation and current packages cannot.
Today we use Legacy
when using the legacy importer and Current
when using GitHub issues or the legacy importer in upda-te-registry
mode. This is a problem because if a package fails to compile and is rejected by GitHub issues / update-registry mode, and then we run the legacy importer locally, the package will be registered despite failing to compile.
I think that we should either a) change the Legacy
vs. Current
distinction to be based on package publication date, where legacy packages are before a certain cutoff date, or b) remove the Legacy
vs. Current
distinction altogether, requiring all packages in the registry to compile (including legacy packages).
My preference is to remove the distinction altogether, for the same reason why we decided to remove packages from the registry if they don't solve: you should be able to install and build any package from the registry.
Option 1: Legacy Cutoff Date
If we choose to retain the legacy vs. current distinction, then I think we should choose a specific date cutoff after which a package is not considered 'legacy.' I would say that September 1, 2022 makes sense as a cutoff, since that's when we "launched" the registry and deprecated the old package sets. If the package was tagged after this date then it's considered a current package.
We know the package publication date because it's used in the package metadata; we determine this when fetching the package source from GitHub. We may not be able to determine the publication date for non-Git packages when we support more Location
s in the future, but for those we can just assume time of registration as the publication date.
Upside: Easy to implement.
Downside: We still have special-casing for 'legacy' vs. 'current' packages in the pipeline. Legacy packages may be broken (they do not compile).
Option 2: All Packages Compile
Alternately, we can remove the legacy vs. current distinction altogether if we choose to only allow packages in the registry if they compile. We have already gone back through the registry to enforce that all packages solve; this would be upping the ante by also enforcing they compile.
This feels like the ideal solution, but it's made difficult because when we are using the legacy importer we must identify a specific compiler to use before hitting the publish
pipeline. Identifying a compiler can be easy or difficult, depending on the circumstances.
I believe these heuristics will allow us to reliably choose a compiler to use to compile packages imported from Bower & Spago:
- If the package uses Spago then we can identify the compiler version from the package set in use.
- If the package uses Bower and has no dependencies, then we can use the package publication date to infer what compiler would have been active around that time; few packages exist that have no dependencies, so to make this more robust we could try the most recent compiler at that time, and then if that fails try a prior version, too.
- If the package uses Bower and has dependencies, and we've already done Suggestion: Include compiler ranges in package metadata #255, then we can take the intersection of compilers supported by its dependencies, and then choose perhaps the most recent compiler from that range.
We can then use the selected compiler version to run the publish pipeline as usual.
Upside: Every package in the registry is known to solve & compile. There is no distinction between a "legacy" and "non-legacy" package, and there is no special-cased code.
Downside: Our heuristics may be wrong and we incorrectly delete packages from the registry. More complicated to implement.