1
- # Robots Parser [ ![ DeepScan grade] ( https://deepscan.io/api/teams/457/projects/16277/branches/344939/badge/grade.svg )] ( https://deepscan.io/dashboard#view=project&tid=457&pid=16277&bid=344939 ) [ ![ GitHub license] ( https://img.shields.io/github/license/samclarke/robots-parser.svg )] ( https://github.com/samclarke/robots-parser/blob/master/license.md ) [ ![ Coverage Status] ( https://coveralls.io/repos/github/samclarke/robots-parser/badge.svg?branch=master )] ( https://coveralls.io/github/samclarke/robots-parser?branch=master )
1
+ # Robots Parser [ ![ NPM downloads ] ( https://img.shields.io/npm/dm/robots-parser )] ( https://www.npmjs.com/package/robots-parser ) [ ![ DeepScan grade] ( https://deepscan.io/api/teams/457/projects/16277/branches/344939/badge/grade.svg )] ( https://deepscan.io/dashboard#view=project&tid=457&pid=16277&bid=344939 ) [ ![ GitHub license] ( https://img.shields.io/github/license/samclarke/robots-parser.svg )] ( https://github.com/samclarke/robots-parser/blob/master/license.md ) [ ![ Coverage Status] ( https://coveralls.io/repos/github/samclarke/robots-parser/badge.svg?branch=master )] ( https://coveralls.io/github/samclarke/robots-parser?branch=master )
2
2
3
3
NodeJS robots.txt parser.
4
4
5
5
It currently supports:
6
6
7
- * User-agent:
8
- * Allow:
9
- * Disallow:
10
- * Sitemap:
11
- * Crawl-delay:
12
- * Host:
13
- * Paths with wildcards (* ) and EOL matching ($)
7
+ - User-agent:
8
+ - Allow:
9
+ - Disallow:
10
+ - Sitemap:
11
+ - Crawl-delay:
12
+ - Host:
13
+ - Paths with wildcards (\ * ) and EOL matching ($)
14
14
15
15
## Installation
16
16
@@ -27,16 +27,19 @@ or via Yarn:
27
27
``` js
28
28
var robotsParser = require (' robots-parser' );
29
29
30
- var robots = robotsParser (' http://www.example.com/robots.txt' , [
31
- ' User-agent: *' ,
32
- ' Disallow: /dir/' ,
33
- ' Disallow: /test.html' ,
34
- ' Allow: /dir/test.html' ,
35
- ' Allow: /test.html' ,
36
- ' Crawl-delay: 1' ,
37
- ' Sitemap: http://example.com/sitemap.xml' ,
38
- ' Host: example.com'
39
- ].join (' \n ' ));
30
+ var robots = robotsParser (
31
+ ' http://www.example.com/robots.txt' ,
32
+ [
33
+ ' User-agent: *' ,
34
+ ' Disallow: /dir/' ,
35
+ ' Disallow: /test.html' ,
36
+ ' Allow: /dir/test.html' ,
37
+ ' Allow: /test.html' ,
38
+ ' Crawl-delay: 1' ,
39
+ ' Sitemap: http://example.com/sitemap.xml' ,
40
+ ' Host: example.com'
41
+ ].join (' \n ' )
42
+ );
40
43
41
44
robots .isAllowed (' http://www.example.com/test.html' , ' Sams-Bot/1.0' ); // true
42
45
robots .isAllowed (' http://www.example.com/dir/test.html' , ' Sams-Bot/1.0' ); // true
@@ -46,24 +49,24 @@ robots.getSitemaps(); // ['http://example.com/sitemap.xml']
46
49
robots .getPreferredHost (); // example.com
47
50
```
48
51
49
-
50
52
### isAllowed(url, [ ua] )
53
+
51
54
** boolean or undefined**
52
55
53
56
Returns true if crawling the specified URL is allowed for the specified user-agent.
54
57
55
58
This will return ` undefined ` if the URL isn't valid for this robots.txt.
56
59
57
-
58
60
### isDisallowed(url, [ ua] )
61
+
59
62
** boolean or undefined**
60
63
61
64
Returns true if crawling the specified URL is not allowed for the specified user-agent.
62
65
63
66
This will return ` undefined ` if the URL isn't valid for this robots.txt.
64
67
65
-
66
68
### getMatchingLineNumber(url, [ ua] )
69
+
67
70
** number or undefined**
68
71
69
72
Returns the line number of the matching directive for the specified URL and user-agent if any.
@@ -72,115 +75,116 @@ Line numbers start at 1 and go up (1-based indexing).
72
75
73
76
Returns -1 if there is no matching directive. If a rule is manually added without a lineNumber then this will return undefined for that rule.
74
77
75
-
76
78
### getCrawlDelay([ ua] )
79
+
77
80
** number or undefined**
78
81
79
82
Returns the number of seconds the specified user-agent should wait between requests.
80
83
81
84
Returns undefined if no crawl delay has been specified for this user-agent.
82
85
83
-
84
86
### getSitemaps()
87
+
85
88
** array**
86
89
87
90
Returns an array of sitemap URLs specified by the ` sitemap: ` directive.
88
91
89
-
90
92
### getPreferredHost()
93
+
91
94
** string or null**
92
95
93
96
Returns the preferred host name specified by the ` host: ` directive or null if there isn't one.
94
97
95
-
96
98
# Changes
97
99
98
100
### Version 2.3.0:
99
101
100
- * Fixed bug where if the user-agent passed to ` isAllowed() ` / ` isDisallowed() ` is called "constructor" it would throw an error.
101
- * Added support for relative URLs. This does not affect the default behavior so can safely be upgraded.
102
-
103
- Relative matching is only allowed if both the robots.txt URL and the URLs being checked are relative.
102
+ - Fixed bug where if the user-agent passed to ` isAllowed() ` / ` isDisallowed() ` is called "constructor" it would throw an error.
103
+ - Added support for relative URLs. This does not affect the default behavior so can safely be upgraded.
104
+
105
+ Relative matching is only allowed if both the robots.txt URL and the URLs being checked are relative.
104
106
105
- For example:
106
- ``` js
107
- var robots = robotsParser (' /robots.txt' , [
108
- ' User-agent: *' ,
109
- ' Disallow: /dir/' ,
110
- ' Disallow: /test.html' ,
111
- ' Allow: /dir/test.html' ,
112
- ' Allow: /test.html'
113
- ].join (' \n ' ));
107
+ For example:
114
108
115
- robots .isAllowed (' /test.html' , ' Sams-Bot/1.0' ); // false
116
- robots .isAllowed (' /dir/test.html' , ' Sams-Bot/1.0' ); // true
117
- robots .isDisallowed (' /dir/test2.html' , ' Sams-Bot/1.0' ); // true
118
- ```
109
+ ``` js
110
+ var robots = robotsParser (
111
+ ' /robots.txt' ,
112
+ [
113
+ ' User-agent: *' ,
114
+ ' Disallow: /dir/' ,
115
+ ' Disallow: /test.html' ,
116
+ ' Allow: /dir/test.html' ,
117
+ ' Allow: /test.html'
118
+ ].join (' \n ' )
119
+ );
119
120
121
+ robots .isAllowed (' /test.html' , ' Sams-Bot/1.0' ); // false
122
+ robots .isAllowed (' /dir/test.html' , ' Sams-Bot/1.0' ); // true
123
+ robots .isDisallowed (' /dir/test2.html' , ' Sams-Bot/1.0' ); // true
124
+ ```
120
125
121
126
### Version 2.2 .0 :
122
127
123
- * Fixed bug that with matching wildcard patterns with some URLs
124
- &ndash ; Thanks to @ckylape for reporting and fixing
125
- * Changed matching algorithm to match Google's implementation in google/robotstxt
126
- * Changed order of precedence to match current spec
128
+ - Fixed bug that with matching wildcard patterns with some URLs
129
+ & ndash; Thanks to @ckylape for reporting and fixing
130
+ - Changed matching algorithm to match Google' s implementation in google/robotstxt
131
+ - Changed order of precedence to match current spec
127
132
128
133
### Version 2.1.1:
129
134
130
- * Fix bug that could be used to causing rule checking to take a long time
131
- &ndash ; Thanks to @andeanfog
135
+ - Fix bug that could be used to causing rule checking to take a long time
136
+ – Thanks to @andeanfog
132
137
133
138
### Version 2.1.0:
134
139
135
- * Removed use of punycode module API's as new URL API handles it
136
- * Improved test coverage
137
- * Added tests for percent encoded paths and improved support
138
- * Added ` getMatchingLineNumber() ` method
139
- * Fixed bug with comments on same line as directive
140
+ - Removed use of punycode module API' s as new URL API handles it
141
+ - Improved test coverage
142
+ - Added tests for percent encoded paths and improved support
143
+ - Added ` getMatchingLineNumber()` method
144
+ - Fixed bug with comments on same line as directive
140
145
141
146
### Version 2.0 .0 :
142
147
143
148
This release is not 100 % backwards compatible as it now uses the new URL APIs which are not supported in Node < 7.
144
149
145
- * Update code to not use deprecated URL module API's.
146
- &ndash ; Thanks to @kdzwinel
150
+ - Update code to not use deprecated URL module API ' s.
151
+ – Thanks to @kdzwinel
147
152
148
153
### Version 1.0.2:
149
154
150
- * Fixed error caused by invalid URLs missing the protocol.
155
+ - Fixed error caused by invalid URLs missing the protocol.
151
156
152
157
### Version 1.0.1:
153
158
154
- * Fixed bug with the "user-agent" rule being treated as case sensitive.
155
- &ndash ; Thanks to @brendonboshell
156
- * Improved test coverage.
157
- &ndash ; Thanks to @schornio
159
+ - Fixed bug with the "user-agent" rule being treated as case sensitive.
160
+ – Thanks to @brendonboshell
161
+ - Improved test coverage.
162
+ – Thanks to @schornio
158
163
159
164
### Version 1.0.0:
160
165
161
- * Initial release.
162
-
166
+ - Initial release.
163
167
164
168
# License
165
169
166
- The MIT License (MIT)
170
+ The MIT License (MIT)
167
171
168
- Copyright (c) 2014 Sam Clarke
172
+ Copyright (c) 2014 Sam Clarke
169
173
170
- Permission is hereby granted, free of charge, to any person obtaining a copy
171
- of this software and associated documentation files (the "Software"), to deal
172
- in the Software without restriction, including without limitation the rights
173
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
174
- copies of the Software, and to permit persons to whom the Software is
175
- furnished to do so, subject to the following conditions:
174
+ Permission is hereby granted, free of charge, to any person obtaining a copy
175
+ of this software and associated documentation files (the "Software"), to deal
176
+ in the Software without restriction, including without limitation the rights
177
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
178
+ copies of the Software, and to permit persons to whom the Software is
179
+ furnished to do so, subject to the following conditions:
176
180
177
- The above copyright notice and this permission notice shall be included in
178
- all copies or substantial portions of the Software.
181
+ The above copyright notice and this permission notice shall be included in
182
+ all copies or substantial portions of the Software.
179
183
180
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
181
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
182
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
183
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
184
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
185
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
186
- THE SOFTWARE.
184
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
185
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
186
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
187
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
188
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
189
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
190
+ THE SOFTWARE.
0 commit comments