Skip to content

Feature: Add option to return the crawled website body in the response #8

@indrajithi

Description

@indrajithi

Currently we do not return the html body from the crawled sites. We only returns the links we find.

  • Should have a flag to toggle this option
  • Default set to False
  • Modify the Json response to have keys ['urls', 'body']

Eg:

{
    "http://github.com": {
        "urls": [
            "http://github.com/",
            "https://githubuniverse.com/",
            "..."
        ],
    "https://github.com/solutions/ci-cd": {
        "urls": [
            "https://github.com/solutions/ci-cd/",
            "https://githubuniverse.com/",
            "..."
        ]
      }
    }
}

This is a feature to return the html body as well. And the result should look look like this.

{
    "http://github.com": {
        "urls": [
            "http://github.com/",
            "https://githubuniverse.com/",
            "..."
        ]
        "body": "<html>stuff</html>",
    "https://github.com/solutions/ci-cd": {
        "urls": [
            "https://github.com/solutions/ci-cd/",
            "https://githubuniverse.com/",
            "..."
        ],
         "body": "<html>other stuff</html>",
      }
    }
}

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions