Added file support and fixed uri typo #60

ulasfo · 2021-05-29T16:48:56Z

Fix uri's having colon after two slashes.
Add support for reading trackers from files.
Use while initilization via scraper.Scraper(trackerfile="filepaths") or add via calling scraper.Addtrackfile("filepaths").
"filepaths" are comma seperated paths. For a single file, put only a single path

Add support for reading trackers from files. Use while initilization via scraper.Scraper(trackerfile="filepaths") or add via calling scraper.Addtrackfile("filepaths"). "filepaths" are comma seperated paths. For a single file, put only a single path

ulasfo · 2021-05-29T16:54:26Z

Also on _connect_request ConnectionResetError could be raised (for various reasons such as URL blocked by isp)
This error was not handled and caused halting of the scraping procedure.

Since the error is being raised in _connect_request and not in scrape_tracker I have generated the error on _connect_request and passed back to scrape_tracker in connection_id parameter. Not the cleanest approach but still a way to preserve the error.

49e94b8f256530dc0d41f740dfe8a4c1 · 2021-06-12T05:11:52Z

Thank you for the improvements, is it okay if I change the base to develop?

49e94b8f256530dc0d41f740dfe8a4c1

Nice feature! But we could add tests to cover it

49e94b8f256530dc0d41f740dfe8a4c1 · 2021-06-12T05:54:49Z

torrent_tracker_scraper/scraper.py

@@ -243,6 +270,9 @@ def scrape_tracker(self, tracker):
        results += _bad_infohashes
        return {"tracker": tracker_url, "results": results}

+    def Addtrackfile(self, filename): #comma seperated lists of files to read trackers from


I cant find references to this method anyhere, where is it used?

49e94b8f256530dc0d41f740dfe8a4c1 · 2021-06-12T05:55:42Z

torrent_tracker_scraper/scraper.py

@@ -115,9 +118,27 @@ def get_good_infohashes(self) -> list:
            )
        return good_infohashes

+    def get_trackers_viafile(self,trackers,filename):


Would you mind writing a test case to cover this? Thanks.

49e94b8f256530dc0d41f740dfe8a4c1 · 2021-06-12T05:56:14Z

torrent_tracker_scraper/scraper.py

@@ -83,7 +85,7 @@ def connect(self, timeout):

 class Scraper:
    def __init__(
-        self, trackers: List = [], infohashes: Tuple[List, str] = [], timeout: int = 10
+        self, trackerfile: str = "", trackers: List = [], infohashes: Tuple[List, str] = [], timeout: int = 10
    ):
        """
        Launches a scraper bound to a particular tracker


A docstring update would be good

I will try to push an update to docstring soon

49e94b8f256530dc0d41f740dfe8a4c1 · 2021-06-12T06:06:07Z

torrent_tracker_scraper/scraper.py

+            logger.error("External tracker file not found: %s", e)
+            #raise Exception("External tracker file not found: %s" % e)
+        else:
+            file1 = open(filename, 'r')


I think we can use Path().open() to also open and read the file so file1 = my_file.open()

https://docs.python.org/3/library/pathlib.html#pathlib.Path.open

49e94b8f256530dc0d41f740dfe8a4c1 requested changes Jun 12, 2021

View reviewed changes

osirisinferi mentioned this pull request Mar 5, 2023

Fix incorrect position of ':' in tracker URI #73

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added file support and fixed uri typo #60

Added file support and fixed uri typo #60

ulasfo commented May 29, 2021

ulasfo commented May 29, 2021 •

edited

Loading

49e94b8f256530dc0d41f740dfe8a4c1 commented Jun 12, 2021 •

edited

Loading

49e94b8f256530dc0d41f740dfe8a4c1 left a comment

49e94b8f256530dc0d41f740dfe8a4c1 Jun 12, 2021

49e94b8f256530dc0d41f740dfe8a4c1 Jun 12, 2021

49e94b8f256530dc0d41f740dfe8a4c1 Jun 12, 2021

ulasfo Jun 27, 2021

49e94b8f256530dc0d41f740dfe8a4c1 Jun 12, 2021

Added file support and fixed uri typo #60

Are you sure you want to change the base?

Added file support and fixed uri typo #60

Conversation

ulasfo commented May 29, 2021

ulasfo commented May 29, 2021 • edited Loading

49e94b8f256530dc0d41f740dfe8a4c1 commented Jun 12, 2021 • edited Loading

49e94b8f256530dc0d41f740dfe8a4c1 left a comment

Choose a reason for hiding this comment

49e94b8f256530dc0d41f740dfe8a4c1 Jun 12, 2021

Choose a reason for hiding this comment

49e94b8f256530dc0d41f740dfe8a4c1 Jun 12, 2021

Choose a reason for hiding this comment

49e94b8f256530dc0d41f740dfe8a4c1 Jun 12, 2021

Choose a reason for hiding this comment

ulasfo Jun 27, 2021

Choose a reason for hiding this comment

49e94b8f256530dc0d41f740dfe8a4c1 Jun 12, 2021

Choose a reason for hiding this comment

ulasfo commented May 29, 2021 •

edited

Loading

49e94b8f256530dc0d41f740dfe8a4c1 commented Jun 12, 2021 •

edited

Loading