Skip to content

masbuz/gbook-downloader

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google Book Downloader

Well, Google is great company I should always show my respects. I create gbook-downloader
project is not going to make some offenses to the company. I am a very curious guy, and want to
find the answer for the question: If one book is composed with 500 pages and only 20 random pages
is public to individual, how many proxies/requests should send in order to get one completed book?

WARNING: The project is setup with anti-business purpose, you are warned NOT to adopt this opensource
software for ANY commercial usage.

Design

  ProxyProviderA --+                                                      +---> images
  ProxyProviderB --|                url              page          output | 
  ...            --|--> ProxyManager -> PageCollector -> BookWeaver ------|---> pdf 
  ...            --| ^                                                    |
  ProxyProviderZ --+ |                                                    +---> html
                     |
                 test proxy
  • ProxyProvider is the public proxy provider on Internet, it provides a list of public proxies. Some are accessed to you while others are not, so the proxy instance should be test
  • ProxyManager invokes ProxyProvider one by one, and states the common proxy tests. If the proxy instance passes the tests, it should be cached, meanwhile, PageCollector adopts the proxy instance candidate.
  • PageCollector takes preview-book-url as input, and try to find as many pages as it can. When the new page is found, it should download to local disk. The result of PageCollector is page array with page name, order and local path.
  • BookWeaver takes the local page collection as input and merges it into one PDF document or HTML document.
    while((downloaded_page_count < total_page_count) || (proxy_instance_candidate_pool.size > 0)) do
      proxy_instance = proxy_instance_candidate_pool.select
      if(proxy_manager.test(proxy_instance))
        page_collector.adopt(proxy_instance).collect
      end
    end

    report.summary
    html_book_weaver.weave(output_dir)

Samples

I take the book Sometimes It’s Turkey. Sometimes It’s Feathers as sample which is composed with 32 pages.

Usage

Below is the guide for installling GoogleBookDownloader and how to use it.

  • Install

    sudo gem install gbook-downloader
  • Usage
  • update/setup proxy database
    gbook-proxy updatedb
  1. download google book
    gbook-downloader book-url -d book-output-dir
    sample:
    gbook-downloader http://books.google.com/books?id=Tmy8LAaVka8C -d ~/gbooks

About

Google Book Downloader

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Ruby 100.0%