Skip to content

Commit f8ba90a

Browse files
committed
update readme
1 parent 97021c7 commit f8ba90a

File tree

1 file changed

+113
-6
lines changed

1 file changed

+113
-6
lines changed

README.md

Lines changed: 113 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -27,21 +27,63 @@ DOWNLOADER_MIDDLEWARES = {
2727
}
2828
```
2929

30-
Others optional settings:
30+
Congratulate, you've finished the all of the required configuration.
31+
32+
If you run the Spider again, Pyppeteer will be started to render every
33+
web page which you configured the request as PyppeteerRequest.
34+
35+
## Settings
36+
37+
GerapyPyppeteer provides some optional settings.
38+
39+
### Logging Level
40+
41+
By default, Pyppeteer will log all the debug messages, so GerapyPyppeteer
42+
configured the logging level of Pyppeteer to WARNING.
43+
44+
If you want to see more logs from Pyppeteer, you can change the this setting:
3145

3246
```python
33-
# pyppeteer logging level
34-
GERAPY_PYPPETEER_LOGGING_LEVEL = logging.WARNING
47+
import logging
48+
GERAPY_PYPPETEER_LOGGING_LEVEL = logging.DEBUG
49+
```
50+
51+
### Download Timeout
52+
53+
Pyppeteer may take some time to render the required web page, you can also change this setting, default is `30s`:
3554

55+
```python
3656
# pyppeteer timeout
3757
GERAPY_PYPPETEER_DOWNLOAD_TIMEOUT = 30
58+
```
59+
60+
### Headless
3861

39-
# pyppeteer browser window
62+
By default, Pyppeteer is running in `Headless` mode, you can also
63+
change it to `False` as you need, default is `True`:
64+
65+
```python
66+
GERAPY_PYPPETEER_HEADLESS = False
67+
```
68+
69+
### Window Size
70+
71+
You can also set the width and height of Pyppeteer window:
72+
73+
```python
4074
GERAPY_PYPPETEER_WINDOW_WIDTH = 1400
4175
GERAPY_PYPPETEER_WINDOW_HEIGHT = 700
76+
```
77+
78+
default is 1400, 700
79+
80+
### Pyppeteer Args
81+
82+
You can also change the args of Pyppeteer, such as `dumpio`, `devtools`, etc.
4283

43-
# pyppeteer settings
44-
GERAPY_PYPPETEER_HEADLESS = True
84+
Optional settings and their default values:
85+
86+
```python
4587
GERAPY_PYPPETEER_DUMPIO = False
4688
GERAPY_PYPPETEER_DEVTOOLS = False
4789
GERAPY_PYPPETEER_EXECUTABLE_PATH = None
@@ -53,6 +95,71 @@ GERAPY_PYPPETEER_DISABLE_SETUID_SANDBOX = True
5395
GERAPY_PYPPETEER_DISABLE_GPU = True
5496
```
5597

98+
### Disable loading of specific resource type
99+
100+
You can disable the loading of specific resource type to
101+
decrease the loading time of web page. You can configure
102+
the disabled resource types using `GERAPY_IGNORE_RESOURCE_TYPES`:
103+
104+
```python
105+
GERAPY_IGNORE_RESOURCE_TYPES = []
106+
```
107+
108+
For example, if you want to disable the loading of css and javascript,
109+
you can set as below:
110+
111+
```python
112+
GERAPY_IGNORE_RESOURCE_TYPES = ['stylesheet', 'script']
113+
```
114+
115+
All of the optional resource type list:
116+
117+
* document: the Original HTML document
118+
* stylesheet: CSS files
119+
* script: JavaScript files
120+
* image: Images
121+
* media: Media files such as audios or videos
122+
* font: Fonts files
123+
* texttrack: Text Track files
124+
* xhr: Ajax Requests
125+
* fetch: Fetch Requests
126+
* eventsource: Event Source
127+
* websocket: Websocket
128+
* manifest: Manifest files
129+
* other: Other files
130+
131+
## Settings for each Pyppeteer Request
132+
133+
`PyppeteerRequest` provide args which can override global settings above.
134+
135+
* wait_until: one of "load", "domcontentloaded", "networkidle0", "networkidle2".
136+
see https://miyakogi.github.io/pyppeteer/reference.html#pyppeteer.page.Page.goto, default is `domcontentloaded`
137+
* wait_for: wait for some element to load
138+
* script: script to execute after page loaded
139+
* sleep: time to sleep after page loaded
140+
* ignore_resource_types: ignored resource types
141+
142+
For example, you can configure PyppeteerRequest as:
143+
144+
```python
145+
from gerapy_pyppeteer import PyppeteerRequest
146+
147+
def parse(self, response):
148+
yield PyppeteerRequest(url,
149+
callback=self.parse_detail,
150+
wait_until='domcontentloaded',
151+
wait_for='title',
152+
script='() => { console.log(document) }',
153+
sleep=2)
154+
```
155+
156+
Then Pyppeteer will:
157+
* wait for document to load
158+
* wait for title to load
159+
* execute `console.log(document)` script
160+
* sleep for 2s
161+
* return the rendered web page content
162+
56163
## Example
57164

58165
For more detail, please see [example](./example).

0 commit comments

Comments
 (0)