Skip to content

fix: Fix the order in which cookies are saved to the SessionCookies and the handler is executed for PlaywrightCrawler#1163

Merged
Pijukatel merged 3 commits into
apify:masterfrom
Mantisus:fix-order-processing-pw-cookies
Apr 23, 2025
Merged

fix: Fix the order in which cookies are saved to the SessionCookies and the handler is executed for PlaywrightCrawler#1163
Pijukatel merged 3 commits into
apify:masterfrom
Mantisus:fix-order-processing-pw-cookies

Conversation

@Mantisus

@Mantisus Mantisus commented Apr 18, 2025

Copy link
Copy Markdown
Collaborator

Description

  • For PlaywrightCrawler, cookies should only be saved to the session store when the handler is fully executed. This is because the browser may continue to set cookies while the handler is being executed

Testing

  • Add a test simulating the installation of a cookie in the browser during the default_handler execution process
  • Update the test_isolation_cookies test

@Mantisus Mantisus changed the title fix: Fix the order in which cookies are saved to the session store and the handler is executed for PlaywrightCrawler fix: Fix the order in which cookies are saved to the SessionCookies and the handler is executed for PlaywrightCrawler Apr 18, 2025
@Mantisus Mantisus requested a review from Copilot April 18, 2025 18:12

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the order of cookie saving in the PlaywrightCrawler to ensure cookies are saved only after the request handler has fully executed.

  • Reorders the cookie saving logic in the _navigate function so that cookies are set after the handler processing.
  • Adds a new unit test to simulate and verify the correct cookie saving behavior.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
tests/unit/crawlers/_playwright/test_playwright_crawler.py Introduces a new test ensuring cookies are saved post-handler execution.
src/crawlee/crawlers/_playwright/_playwright_crawler.py Reorders the cookie saving logic to occur after the handler finishes.

@Mantisus Mantisus self-assigned this Apr 18, 2025
@Mantisus Mantisus requested review from Pijukatel and janbuchar April 18, 2025 18:13
@Mantisus Mantisus requested a review from Copilot April 18, 2025 18:36

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the order in which cookies are saved for PlaywrightCrawler by ensuring that session cookies are only set after the handler has fully executed.

  • Updated cookie saving logic in _playwright_crawler.py to run after handler processing.
  • Enhanced tests in test_playwright_crawler.py to verify that cookies are correctly stored post-handler execution.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
tests/unit/crawlers/_playwright/test_playwright_crawler.py Updated test code to record sessions and verify correct cookie saving order.
src/crawlee/crawlers/_playwright/_playwright_crawler.py Reordered the cookie saving logic to execute after the response is processed.
Comments suppressed due to low confidence (2)

tests/unit/crawlers/_playwright/test_playwright_crawler.py:360

  • The code assumes sessions_ids has at least three elements, but earlier assertions indicate only two unique sessions. Please verify that this indexing is intentional and update the test to accurately reflect the expected session count.
clean_session_id = sessions_ids[2]

tests/unit/crawlers/_playwright/test_playwright_crawler.py:360

  • [nitpick] The identifier 'clean_session_id' may be ambiguous; consider renaming it to a more descriptive name that clarifies its role in indicating the session state after handler processing.
clean_session_id = sessions_ids[2]

@Pijukatel Pijukatel merged commit 82ff69a into apify:master Apr 23, 2025
Mantisus added a commit to Mantisus/crawlee-python that referenced this pull request Apr 24, 2025
… and the handler is executed for `PlaywrightCrawler` (apify#1163)

### Description

- For `PlaywrightCrawler`, cookies should only be saved to the session
store when the handler is fully executed. This is because the browser
may continue to set cookies while the handler is being executed

### Testing

- Add a test simulating the installation of a cookie in the browser
during the `default_handler` execution process
- Update the `test_isolation_cookies` test
Mantisus added a commit to Mantisus/crawlee-python that referenced this pull request Apr 24, 2025
… and the handler is executed for `PlaywrightCrawler` (apify#1163)

### Description

- For `PlaywrightCrawler`, cookies should only be saved to the session
store when the handler is fully executed. This is because the browser
may continue to set cookies while the handler is being executed

### Testing

- Add a test simulating the installation of a cookie in the browser
during the `default_handler` execution process
- Update the `test_isolation_cookies` test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants