Where is the data saved ? #585
-
I could not find anything in the documentation or config files, the only thing i found is a website i wanted to crawl archived on Side question: Also is there an unofficial or official Discord server for the Internet Archive or heritrix ? Note: I might be dumb |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
I figured out they get saved locally as warc files |
Beta Was this translation helpful? Give feedback.
-
I found this that states that archive.org does not allow WARC uploads. |
Beta Was this translation helpful? Give feedback.
-
Heritrix saves data to WARC files in the Most web archives only accept WARC files from trusted sources, not from the general public, as there isn't a way to guarantee the records are unaltered. |
Beta Was this translation helpful? Give feedback.
Heritrix saves data to WARC files in the
jobs/{jobname}/{timestamp}/warcs
subdirectory.Most web archives only accept WARC files from trusted sources, not from the general public, as there isn't a way to guarantee the records are unaltered.