Fix various issues #188

pabs3 · 2021-05-04T01:01:08Z

If nessecary I can split this up into multiple pull requests,
but then there will be some conflicts between them.

This avoids problems related to URL encoding. Obsoletes: hartator#116

Suggested-by: codespell, spellintian

This avoids the messages breaking JSON parsing when the output is being redirected to a file and parsed.

This avoids producing JSON that is not parsable.

m3nu · 2021-06-04T05:35:20Z

Works great. Used this branch when I needed to use this project. 👍

Hope this can be merged soon to keep the original repo relevant.

hartator

Sorry for the delay in checking things out.

Thanks @pabs3 for the work and all the typos fixed. It works great! ❤️

hartator · 2021-06-07T00:32:23Z

README.md

@@ -42,7 +42,7 @@ It will download the last version of every file present on Wayback Machine to `.
 	    -x, --exclude EXCLUDE_FILTER     Skip downloading of urls that match this filter
 					     (use // notation for the filter to be treated as a regex)
 	    -a, --all                        Expand downloading to error files (40x and 50x) and redirections (30x)
-	    -c, --concurrency NUMBER         Number of multiple files to dowload at a time
+	    -c, --concurrency NUMBER         Number of multiple files to download at a time


hartator · 2021-06-07T00:32:35Z

README.md

@@ -62,7 +62,7 @@ Example:

    -s, --all-timestamps 

-Optional. This option will download all timestamps/snapshots for a given website. It will uses the timepstamp of each snapshot as directory.
+Optional. This option will download all timestamps/snapshots for a given website. It will uses the timestamp of each snapshot as directory.


hartator · 2021-06-07T00:33:07Z

README.md

@@ -169,7 +169,7 @@ Example:

    -c, --concurrency NUMBER  

-Optional. Specify the number of multiple files you want to download at the same time. Allows to speed up the download of a website significantly. Default is to download one file at a time.
+Optional. Specify the number of multiple files you want to download at the same time. Allows one to speed up the download of a website significantly. Default is to download one file at a time.


👍 Thank you for all the typo corrections.

hartator · 2021-06-07T00:35:01Z

bin/wayback_machine_downloader

@@ -46,7 +46,7 @@ option_parser = OptionParser.new do |opts|
    options[:all] = true
  end

-  opts.on("-c", "--concurrency NUMBER", Integer, "Number of multiple files to dowload at a time", "Default is one file at a time (ie. 20)") do |t|
+  opts.on("-c", "--concurrency NUMBER", Integer, "Number of multiple files to download at a time", "Default is one file at a time (ie. 20)") do |t|


hartator · 2021-06-07T00:35:24Z

lib/wayback_machine_downloader.rb

-      file_timestamp = line[0..13].to_i
-      file_url = line[15..-2]
+    get_all_snapshots_to_consider.each do |file_timestamp, file_url|
+      next unless file_url.include?('/')


hartator · 2021-06-07T00:35:36Z

lib/wayback_machine_downloader/archive_api.rb

    end
    if page_index
-      parameters += "&page=#{page_index}"
+      parameters.push(["page", page_index])


hartator · 2021-06-07T00:35:43Z

lib/wayback_machine_downloader/tidy_bytes.rb

@@ -70,7 +70,7 @@ def tidy_bytes(force = false)
        if is_unused || is_restricted
          bytes[i] = tidy_byte(byte)
        elsif is_cont
-          # Not expecting contination byte? Clean up. Otherwise, now expect one less.
+          # Not expecting continuation byte? Clean up. Otherwise, now expect one less.


hartator · 2021-06-07T00:55:49Z

@pabs3 @m3nu I've published a new Gem version that includes these changes: 2.3.0.

pabs3 · 2021-06-07T01:06:08Z

Excellent, thanks for that. I was thinking of packaging the project for to Debian at some point. Would you be OK with that?

…

-- bye, pabs https://bonedaddy.net/pabs3/

hartator · 2021-06-07T02:15:24Z

Excellent, thanks for that. I was thinking of packaging the project for
to Debian at some point. Would you be OK with that?

Sure, go for it. 👍

pabs3 added 7 commits May 3, 2021 17:44

Construct the cdx API query using a URI object

afab72c

This avoids problems related to URL encoding. Obsoletes: hartator#116

Switch to the JSON output format for easier parsing

cd29f79

Update Internet Archive URLs to https

3d0c702

Instruct git to ignore a file generated by the tests

bec41e0

Fix typos

ea15965

Suggested-by: codespell, spellintian

Print progress messages to stderr when printing JSON

06e2595

This avoids the messages breaking JSON parsing when the output is being redirected to a file and parsed.

Do not emit a comma for the final item in JSON output

ba4ca60

This avoids producing JSON that is not parsable.

hartator added 2 commits June 6, 2021 19:47

Make URI#open cross Ruby versions compatible

30475c5

Bump Gem version

83b4f88

hartator approved these changes Jun 7, 2021

View reviewed changes

hartator merged commit 66ff4d9 into hartator:master Jun 7, 2021

pabs3 deleted the fixes branch June 7, 2021 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix various issues #188

Fix various issues #188

pabs3 commented May 4, 2021

m3nu commented Jun 4, 2021

hartator left a comment •

edited

Loading

hartator Jun 7, 2021

hartator Jun 7, 2021

hartator Jun 7, 2021

hartator Jun 7, 2021 •

edited

Loading

hartator Jun 7, 2021

hartator Jun 7, 2021

hartator Jun 7, 2021

hartator commented Jun 7, 2021

pabs3 commented Jun 7, 2021 via email

hartator commented Jun 7, 2021 •

edited

Loading

Fix various issues #188

Fix various issues #188

Conversation

pabs3 commented May 4, 2021

m3nu commented Jun 4, 2021

hartator left a comment • edited Loading

Choose a reason for hiding this comment

hartator Jun 7, 2021

Choose a reason for hiding this comment

hartator Jun 7, 2021

Choose a reason for hiding this comment

hartator Jun 7, 2021

Choose a reason for hiding this comment

hartator Jun 7, 2021 • edited Loading

Choose a reason for hiding this comment

hartator Jun 7, 2021

Choose a reason for hiding this comment

hartator Jun 7, 2021

Choose a reason for hiding this comment

hartator Jun 7, 2021

Choose a reason for hiding this comment

hartator commented Jun 7, 2021

pabs3 commented Jun 7, 2021 via email

hartator commented Jun 7, 2021 • edited Loading

hartator left a comment •

edited

Loading

hartator Jun 7, 2021 •

edited

Loading

hartator commented Jun 7, 2021 •

edited

Loading