Skip to content

robotstxt #25

@petermeissner

Description

@petermeissner
    1. What does this package do? (explain in 50 words or less)

Web scraping allows to gather information of scientific value - mainly social science related in my experience. While scraping web pages one should respect permissions declared in robots.txt files.
The package provides functions to retrieve and parse robots.txt files. The core functionality is to check a bots/users permission for one or more resources (paths) for a given domain. To ease checking all functions have been bundled with relevant data into an R6 robotstxt class but everything works functional or object oriented depending on the users preferences.

    1. Paste the full DESCRIPTION file inside a code block (bounded by ``` on either end).
Package: robotstxt
Type: Package
Title: A 'robots.txt' Parser and Webbot/Webspider/Webcrawler Permissions Checker
Version: 0.1.0
Author: Peter Meissner
Maintainer: Peter Meissner <retep.meissner@gmai.com>
Description: Class ('R6') and accompanying methods to
    parse and check 'robots.txt' files. Data fields are provided as
    data frames and vectors. Permissions can be checked by providing
    path character vectors and optional bot names.
License: MIT + file LICENSE
LazyData: TRUE
BugReports: https://github.com/petermeissner/robotstxt/issues
URL: https://github.com/petermeissner/robotstxt
Imports:
    R6 (>= 2.1.1),
    stringr (>= 1.0.0),
    httr (>= 1.0.0)
Suggests:
    knitr,
    rmarkdown,
    dplyr,
    testthat
Depends:
    R (>= 3.0.0)
VignetteBuilder: knitr
RoxygenNote: 5.0.1
    1. URL for the package (the development repository, not a stylized html page)

https://github.com/petermeissner/robotstxt

    1. What data source(s) does it work with (if applicable)?

robots.txt files like:

Package developers and users that want an easy way to be nice while gathering data from the web.

    1. Are there other R packages that accomplish the same thing? If so, what is different about yours?

None that I know of.

    1. Check the box next to each policy below, confirming that you agree. These are mandatory.
  • This package does not violate the Terms of Service of any service it interacts with
  • The repository has continuous integration with Travis and/or another service: https://travis-ci.org/petermeissner/robotstxt
  • The package contains a vignette:
  • The package contains a reasonably complete readme with devtools install instructions
  • The package contains unit tests
  • The package only exports functions to the NAMESPACE that are intended for end users
    1. Do you agree to follow the [rOpenSci packaging guidelines]

Yes, good guidelines!

  • Are there any package dependencies not on CRAN?

No.

  • Do you intend for this package to go on CRAN?

With or without ropensci.

  • Does the package have a CRAN accepted license?

yes, MIT

  • Did devtools::check() produce any errors or warnings? If so paste them below.

no:

* DONE
Status: OK

R CMD check succeeded
    1. Please add explanations below for any exceptions to the above:

Does not apply.

    1. If this is a resubmission following rejection, please explain the change in cirucmstances.

No, no resubmit.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions