-
Notifications
You must be signed in to change notification settings - Fork 510
feat: add transform_request_function parameter for SitemapRequestLoader
#1525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| exclude: list[re.Pattern[Any] | Glob] | None = None, | ||
| max_buffer_size: int = 200, | ||
| persist_state_key: str | None = None, | ||
| transform_request_function: Callable[[RequestOptions], RequestOptions | RequestTransformAction] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buuut... shouldn't this also receive an URL of the origin sitemap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that makes sense. A sitemap cannot contain links to another domain.
This way, users can easily create a mapping between the original link to the sitemap and the link inside transform_request_function. From my point of view, the most valuable thing that adding transform_request_function gives is the ability to add a label so that the request is processed by the appropriate handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes a lot of sense, thanks. But I'm afraid that this won't "click" for a lot of people. Perhaps we could add an example that showcases this?
| def transform_request( | ||
| request_options: RequestOptions, | ||
| ) -> RequestOptions | RequestTransformAction: | ||
| request_host = URL(request_options['url']).host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should mention that a sitemap should only contain links to the same host here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Added.
Description
This PR is inspired by this discussion.
Add support
transform_request_functionforSitemapRequestLoader, which works the same way as inEnqueueLinksFunction. This can be useful for settinglabelfor correct routing oruser_datawith custom data.