@@ -27,21 +27,63 @@ DOWNLOADER_MIDDLEWARES = {
27
27
}
28
28
```
29
29
30
- Others optional settings:
30
+ Congratulate, you've finished the all of the required configuration.
31
+
32
+ If you run the Spider again, Pyppeteer will be started to render every
33
+ web page which you configured the request as PyppeteerRequest.
34
+
35
+ ## Settings
36
+
37
+ GerapyPyppeteer provides some optional settings.
38
+
39
+ ### Logging Level
40
+
41
+ By default, Pyppeteer will log all the debug messages, so GerapyPyppeteer
42
+ configured the logging level of Pyppeteer to WARNING.
43
+
44
+ If you want to see more logs from Pyppeteer, you can change the this setting:
31
45
32
46
``` python
33
- # pyppeteer logging level
34
- GERAPY_PYPPETEER_LOGGING_LEVEL = logging.WARNING
47
+ import logging
48
+ GERAPY_PYPPETEER_LOGGING_LEVEL = logging.DEBUG
49
+ ```
50
+
51
+ ### Download Timeout
52
+
53
+ Pyppeteer may take some time to render the required web page, you can also change this setting, default is ` 30s ` :
35
54
55
+ ``` python
36
56
# pyppeteer timeout
37
57
GERAPY_PYPPETEER_DOWNLOAD_TIMEOUT = 30
58
+ ```
59
+
60
+ ### Headless
38
61
39
- # pyppeteer browser window
62
+ By default, Pyppeteer is running in ` Headless ` mode, you can also
63
+ change it to ` False ` as you need, default is ` True ` :
64
+
65
+ ``` python
66
+ GERAPY_PYPPETEER_HEADLESS = False
67
+ ```
68
+
69
+ ### Window Size
70
+
71
+ You can also set the width and height of Pyppeteer window:
72
+
73
+ ``` python
40
74
GERAPY_PYPPETEER_WINDOW_WIDTH = 1400
41
75
GERAPY_PYPPETEER_WINDOW_HEIGHT = 700
76
+ ```
77
+
78
+ default is 1400, 700
79
+
80
+ ### Pyppeteer Args
81
+
82
+ You can also change the args of Pyppeteer, such as ` dumpio ` , ` devtools ` , etc.
42
83
43
- # pyppeteer settings
44
- GERAPY_PYPPETEER_HEADLESS = True
84
+ Optional settings and their default values:
85
+
86
+ ``` python
45
87
GERAPY_PYPPETEER_DUMPIO = False
46
88
GERAPY_PYPPETEER_DEVTOOLS = False
47
89
GERAPY_PYPPETEER_EXECUTABLE_PATH = None
@@ -53,6 +95,71 @@ GERAPY_PYPPETEER_DISABLE_SETUID_SANDBOX = True
53
95
GERAPY_PYPPETEER_DISABLE_GPU = True
54
96
```
55
97
98
+ ### Disable loading of specific resource type
99
+
100
+ You can disable the loading of specific resource type to
101
+ decrease the loading time of web page. You can configure
102
+ the disabled resource types using ` GERAPY_IGNORE_RESOURCE_TYPES ` :
103
+
104
+ ``` python
105
+ GERAPY_IGNORE_RESOURCE_TYPES = []
106
+ ```
107
+
108
+ For example, if you want to disable the loading of css and javascript,
109
+ you can set as below:
110
+
111
+ ``` python
112
+ GERAPY_IGNORE_RESOURCE_TYPES = [' stylesheet' , ' script' ]
113
+ ```
114
+
115
+ All of the optional resource type list:
116
+
117
+ * document: the Original HTML document
118
+ * stylesheet: CSS files
119
+ * script: JavaScript files
120
+ * image: Images
121
+ * media: Media files such as audios or videos
122
+ * font: Fonts files
123
+ * texttrack: Text Track files
124
+ * xhr: Ajax Requests
125
+ * fetch: Fetch Requests
126
+ * eventsource: Event Source
127
+ * websocket: Websocket
128
+ * manifest: Manifest files
129
+ * other: Other files
130
+
131
+ ## Settings for each Pyppeteer Request
132
+
133
+ ` PyppeteerRequest ` provide args which can override global settings above.
134
+
135
+ * wait_until: one of "load", "domcontentloaded", "networkidle0", "networkidle2".
136
+ see https://miyakogi.github.io/pyppeteer/reference.html#pyppeteer.page.Page.goto , default is ` domcontentloaded `
137
+ * wait_for: wait for some element to load
138
+ * script: script to execute after page loaded
139
+ * sleep: time to sleep after page loaded
140
+ * ignore_resource_types: ignored resource types
141
+
142
+ For example, you can configure PyppeteerRequest as:
143
+
144
+ ``` python
145
+ from gerapy_pyppeteer import PyppeteerRequest
146
+
147
+ def parse (self , response ):
148
+ yield PyppeteerRequest(url,
149
+ callback = self .parse_detail,
150
+ wait_until = ' domcontentloaded' ,
151
+ wait_for = ' title' ,
152
+ script = ' () => { console.log(document) }' ,
153
+ sleep = 2 )
154
+ ```
155
+
156
+ Then Pyppeteer will:
157
+ * wait for document to load
158
+ * wait for title to load
159
+ * execute ` console.log(document) ` script
160
+ * sleep for 2s
161
+ * return the rendered web page content
162
+
56
163
## Example
57
164
58
165
For more detail, please see [ example] ( ./example ) .
0 commit comments