You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 17, 2020. It is now read-only.
I have WARC files collected with node-warc 3.1.0 that can not be opened in Webrecorder player (No pages found). The only discerning characteristic is that the files are archived from Facebook posts with long URL:s. Other files archived with the same tool seem to work fine.
Listing URLs from the WARC with warcio works. Not sure if this is a bug in Webrecorder player or node-warc. Example file in the related node-warc issue: N0taN3rd/node-warc#25
Version details:
webrecorder player 1.6.1 (Mac)
webrecorder 4.1.5 (@e926c65)
pywb 2.1.1 (@3e0bb49)
har2warc 1.0.4
warcio 1.6.2
The text was updated successfully, but these errors were encountered:
To clarify, the issue is not that the file doesn't load, it's related to the page detection.
To make things easier for the user, when opening non-Webrecorder WARCs, we attempt to 'detect' which URLs are pages, and you are right in that long urls are occasionally rejected. (The other option is for squidwarc to write the page metadata directly as WR does, and @N0taN3rd and I are looking into that as well).
openwayback does not have any such page detection, but allows you to enter urls directly. We also need to add support for loading an arbitrary URL that you know, even if its not detected as a page.
We plan to make exploring the WARC easier as well.
ikreymer
changed the title
Problem playing back WARC with long URL:s
Support directly replay of URLs that are not pages (was: Problem playing back WARC with long URL:s)
Dec 12, 2018
I have WARC files collected with node-warc 3.1.0 that can not be opened in Webrecorder player (No pages found). The only discerning characteristic is that the files are archived from Facebook posts with long URL:s. Other files archived with the same tool seem to work fine.
Listing URLs from the WARC with warcio works. Not sure if this is a bug in Webrecorder player or node-warc. Example file in the related node-warc issue: N0taN3rd/node-warc#25
Version details:
webrecorder player 1.6.1 (Mac)
webrecorder 4.1.5 (@e926c65)
pywb 2.1.1 (@3e0bb49)
har2warc 1.0.4
warcio 1.6.2
The text was updated successfully, but these errors were encountered: