Skip to content

Commit

Permalink
Docs for PLAYWRIGHT_PROCESS_REQUEST_HEADERS
Browse files Browse the repository at this point in the history
  • Loading branch information
elacuesta committed Jan 26, 2022
1 parent 53e9259 commit 16ac2a2
Showing 1 changed file with 21 additions and 2 deletions.
23 changes: 21 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,25 @@ TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
the default value will be used (30000 ms at the time of writing this).
See the docs for [BrowserContext.set_default_navigation_timeout](https://playwright.dev/python/docs/api/class-browsercontext#browser_contextset_default_navigation_timeouttimeout).

* `PLAYWRIGHT_PROCESS_REQUEST_HEADERS` (type `str|Callable`, default `scrapy_playwright.headers.use_scrapy_headers`)

A coroutine function (`async def`), or the path to one, that processes headers for a given request
and returns a dictionary with the headers to be use (note that, depending on the browser, additional
default headers will be sent as well).

The function must return a `dict` object, and receives the following keyword arguments:

```python
browser_type: str, playwright_request: playwright.async_api.Request, scrapy_headers: scrapy.http.headers.Headers
```

The default value (`scrapy_playwright.headers.use_scrapy_headers`) tries to emulate Scrapy's
behaviour for navigation requests, i.e. overriding headers with their values from the Scrapy request.
For non-navigation requests (e.g. images, stylesheets, scripts, etc), only the `User-Agent` header
is overriden, for consistency.

There is nother function available: `scrapy_playwright.headers.use_playwright_headers`,
which will return the headers from the Playwright request without any changes.

## Basic usage

Expand Down Expand Up @@ -135,8 +154,8 @@ class AwesomeSpider(scrapy.Spider):
By default, outgoing requests include the `User-Agent` set by Scrapy (either with the
`USER_AGENT` or `DEFAULT_REQUEST_HEADERS` settings or via the `Request.headers` attribute).
This could cause some sites to react in unexpected ways, for instance if the user agent
does not match the Browser being used. If you prefer to send the `User-Agent` from the Browser,
set the Scrapy user agent to `None`.
does not match the running Browser. If you prefer the `User-Agent` sent by
default by the specific browser you're using, set the Scrapy user agent to `None`.


## Receiving the Page object in the callback
Expand Down

0 comments on commit 16ac2a2

Please sign in to comment.