Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added file support and fixed uri typo #60

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ulasfo
Copy link

@ulasfo ulasfo commented May 29, 2021

Fix uri's having colon after two slashes.
Add support for reading trackers from files.
Use while initilization via scraper.Scraper(trackerfile="filepaths") or add via calling scraper.Addtrackfile("filepaths").
"filepaths" are comma seperated paths. For a single file, put only a single path

Add support for reading trackers from files.
    Use while initilization via scraper.Scraper(trackerfile="filepaths") or add via calling scraper.Addtrackfile("filepaths").
    "filepaths" are comma seperated paths. For a single file, put only a single path
@ulasfo
Copy link
Author

ulasfo commented May 29, 2021

Also on _connect_request ConnectionResetError could be raised (for various reasons such as URL blocked by isp)
This error was not handled and caused halting of the scraping procedure.

Since the error is being raised in _connect_request and not in scrape_tracker I have generated the error on _connect_request and passed back to scrape_tracker in connection_id parameter. Not the cleanest approach but still a way to preserve the error.

@49e94b8f256530dc0d41f740dfe8a4c1
Copy link
Collaborator

49e94b8f256530dc0d41f740dfe8a4c1 commented Jun 12, 2021

Thank you for the improvements, is it okay if I change the base to develop?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice feature! But we could add tests to cover it

@@ -243,6 +270,9 @@ def scrape_tracker(self, tracker):
results += _bad_infohashes
return {"tracker": tracker_url, "results": results}

def Addtrackfile(self, filename): #comma seperated lists of files to read trackers from

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cant find references to this method anyhere, where is it used?

@@ -115,9 +118,27 @@ def get_good_infohashes(self) -> list:
)
return good_infohashes

def get_trackers_viafile(self,trackers,filename):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind writing a test case to cover this? Thanks.

@@ -83,7 +85,7 @@ def connect(self, timeout):

class Scraper:
def __init__(
self, trackers: List = [], infohashes: Tuple[List, str] = [], timeout: int = 10
self, trackerfile: str = "", trackers: List = [], infohashes: Tuple[List, str] = [], timeout: int = 10
):
"""
Launches a scraper bound to a particular tracker

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A docstring update would be good

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to push an update to docstring soon

logger.error("External tracker file not found: %s", e)
#raise Exception("External tracker file not found: %s" % e)
else:
file1 = open(filename, 'r')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use Path().open() to also open and read the file so file1 = my_file.open()

https://docs.python.org/3/library/pathlib.html#pathlib.Path.open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants