Skip to content

Conversation

ikreymer
Copy link
Member

@ikreymer ikreymer commented Dec 31, 2024

Chromium now interrupts fetch() if abort() is called or page is navigated, so autofetch behavior using native fetch() is less than ideal. This PR adds support for __bx_fetch() command for autofetch behavior (supported in browsertrix-behaviors 0.6.6) to fetch separately from browser's reguar fetch()

  • __bx_fetch() starts a fetch, but does not return content to browser, doesn't need abort(), unaffected by page navigation, but will still try to use browser network stack when possible, making it more efficient for background fetching.
  • if network stack fetch fails, fallback to regular node fetch() in the crawler.
    Additional improvements for interrupted fetch:
  • don't store truncated media responses, even for 200
  • avoid doing duplicate async fetching if response already handled (eg. fetch handled in multiple contexts)
  • fixes Autofetch behavior results in empty 200 responses #735, where fetch was interrupted, resulted in an empty response

- if browser media fetch interrupts (eg. due to behavior closing an element), don't store truncated response, even 200
- do an async fetch for same url
- avoid doing duplicate async fetching if response already handled (eg. fetch handled in multiple contexts)
- add support for __bx_fetch() command for autofetch behavior (browsertrix-behaviors 0.6.6) to fetch separately from browser's native fetch()
- __bx_fetch() starts a fetch, but does not return content to browser, doesn't need abort(), unaffected by page navigation, but will try to use browser network stack when possible
@ikreymer ikreymer requested a review from tw4l December 31, 2024 04:31
Copy link
Member

@tw4l tw4l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and tested that the fix is working as expected against sample URL in the issue. Nice work!

@ikreymer ikreymer merged commit d923e11 into main Dec 31, 2024
4 checks passed
@ikreymer ikreymer deleted the skip-partial-responses branch December 31, 2024 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Autofetch behavior results in empty 200 responses
2 participants