-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
bugSomething isn't working.Something isn't working.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.
Description
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/utils
Issue description
Load a robots.txt containing only disallow rules and check any non matching url. It should return true, but returns false.
That is because the underlying robots-parser package returns undefined for urls, which are not part of the robots.txt. In the RobotsFile class undefined is converted to false which is wrong. robots.txt is used to define exclusion rules, therefore non matching urls should be allowed.
Either undefined should be converted to true or the wrapping method should also return undefined.
Code sample
const robots = `
User-agent: *
Disallow: /private
`;
const robots = RobotsFile.from('https://example.com', robots);
robots.isAllowed('https://example.com/allowed'); // returns false, should return truePackage version
3.9.2
Node.js version
v21.7.3
Operating system
No response
Apify platform
- Tick me if you encountered this issue on the Apify platform
I have tested this on the next release
No response
Other context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't working.Something isn't working.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.