Fix Wayback Machine imports on Chrome and improve URL cleaning #745
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Code and description are AI-generated
This PR fixes issues when importing releases from the Internet Archive (Wayback Machine), specifically targeting Chrome compatibility and URL sanitization. #466
Note
For this to work reliably in Chrome with Tampermonkey the Content Script API needs to be set to
UserScripts API DynamicThe Problem
beforescriptexecuteevent to patch the Wayback Machine's rewriter (wombat.js). This event is specific to Firefox and was removed from the HTML5 spec, meaning the patch failed silently on Chrome/Chromium, causing the "Import" button to break on archived pages.web.archive.orgprefix in the Release URL, Label URL, and License URL (e.g.,https://web.archive.org/web/.../https://...).The Solution
Wombat Patch
Replaced the Firefox-only
beforescriptexecutelistener with anObject.definePropertyhook. This intercepts the creation of the global_WBWombatobject, allowing us to injectno_rewrite_prefixes(excluding MusicBrainz URLs) safely on all browsers, including Chrome, before the archiver initializes.URL Cleaning
String.prototype.fix_bandcamp_urlwith a regex to detect and strip Wayback Machine prefixes (/web/YYYYMMDD.../), ensuring all imported URLs are "clean" and use HTTPS.fix_bandcamp_url()to the License (Creative Commons) link and Label back-links, which were previously extracting the raw, "dirty" DOM attributes.Changes
fix_bandcamp_urlto strip archive prefixes.ccIcons(License) andlabelbacklinkextraction.window.location.hostname === 'web.archive.org'logic block with the newObject.definePropertyimplementation.Testing
Label URL, and License URL in the editor are the original Bandcamp URLs, not Archive.org links.