@@ -47,7 +47,7 @@ to scrape from there (note that all keys and values in the data you pass must
47
47
be strings)::
48
48
49
49
>>> url1 = 'http://pypi.python.org/pypi/w3lib'
50
- >>> data = {'name': 'w3lib 1.0 ', 'author': 'Scrapy project', 'description': 'Library of web-related functions'}
50
+ >>> data = {'name': 'w3lib 1.1 ', 'author': 'Scrapy project', 'description': 'Library of web-related functions'}
51
51
>>> s.train(url1, data)
52
52
53
53
Finally, tell the scraper to scrape any other similar page and it will return
@@ -101,27 +101,27 @@ To list available templates from a scraper::
101
101
102
102
To add a new annotation, you usually test the selection criteria first::
103
103
104
- scrapely> a 0 w3lib 1.0
105
- [0] u'<a href="/pypi/w3lib/1.0 ">w3lib 1.0 </a>'
106
- [1] u'<h1>w3lib 1.0 </h1>'
107
- [2] u'<title>Python Package Index : w3lib 1.0 </title>'
104
+ scrapely> a 0 w3lib 1.1
105
+ [0] u'<a href="/pypi/w3lib/1.1 ">w3lib 1.1 </a>'
106
+ [1] u'<h1>w3lib 1.1 </h1>'
107
+ [2] u'<title>Python Package Index : w3lib 1.1 </title>'
108
108
109
109
You can refine by position. To take the one in position [1]::
110
110
111
- scrapely> a 0 w3lib 1.0 -n 1
112
- [0] u'<h1>w3lib 1.0 </h1>'
111
+ scrapely> a 0 w3lib 1.1 -n 1
112
+ [0] u'<h1>w3lib 1.1 </h1>'
113
113
114
114
To annotate some fields on the template::
115
115
116
- scrapely> a 0 w3lib 1.0 -n 1 -f name
117
- [new] (name) u'<h1>w3lib 1.0 </h1>'
116
+ scrapely> a 0 w3lib 1.1 -n 1 -f name
117
+ [new] (name) u'<h1>w3lib 1.1 </h1>'
118
118
scrapely> a 0 Scrapy project -n 0 -f author
119
119
[new] u'<span>Scrapy project</span>'
120
120
121
121
To list annotations on a template::
122
122
123
123
scrapely> al 0
124
- [0-0] (name) u'<h1>w3lib 1.0 </h1>'
124
+ [0-0] (name) u'<h1>w3lib 1.1 </h1>'
125
125
[0-1] (author) u'<span>Scrapy project</span>'
126
126
127
127
To scrape another similar page with the already added templates::
0 commit comments