add pagination support to list #26

jacob-keller · 2023-10-27T19:02:10Z

This series adds support for handling the pagination Link headers when
listing patches. Doing this allows handling replies which contain more than
30 patches.

To do this, I propose switching from using urllib to using the python
requests library. Alternatively we could process the Link headers manually
and continue using urllib without the extra dependency. This is possible but
could be somewhat tricky to get correct.

I only implemented Link handling in the _list implementation since that one
was the most clear place where it was important. I'm not sure what other
API endpoints can return Link headers.

convert REST API to use requests library
Handle the pagination Link headers for _list requests

stephenfin · 2023-11-17T12:38:11Z

This looks mostly good. My one request is to avoid using requests if at all possible. Currently we have a single dependency, which is a backport of a stdlib library and only necessary on Python 3.7 (which is now EOL).It would be good to maintain that if possible. Given we only need to be concerned with our own implementation of the Link header, I'm hoping this won't be too big an ask.

stephenfin · 2023-11-17T12:38:37Z

Also, you can fix the linters CI job by running pre-commit run -a.

jacob-keller · 2023-11-17T16:47:38Z

Makes sense. I'll have to dig into it a bit to see how to parse. I started that way but found it difficult to parse correctly (I ended up getting double queries somehow). It shouldn't be too tricky to figure out tho.

The patchwork REST API defaults to sending a maximum of 30 items for API requests which return a list. If the API request has more than 30 items, then pwclient will only ever see the first page, and confuse users who expect to see the entire set of items. To handle this, the API includes 'Link' headers in the response which indicate whether there is more data, and if so, what URL the data is available at. Add a method to extract the page number query parameter from the Link header URL if it exists. Use this to recursively call _list() until there is no more next page to obtain. Implement extraction of the page number as a new static method which deals with the complexity of analyzing the link headers. We only check for the first link header, and only the first 'rel="next"' portion of it. Split the discovered URL using urllib.parse.urlparse to find the query section and locate the page=<value> bit. This method avoids needing to add a new dependency (such as the requests library), and should work for valid Link headers provided by patchwork. At any point, if we fail to find the expected data, link parsing is stopped and we return the set of items we've found so far. This should fix pwclient to properly handle multiple pages of data. This will likely cause some invocations of pwclient to slow down as they must now query every available page. This is preferable to not returning the available data. A future improvement to reduce this would be to extend -n and -N to work with pages so that we avoid downloading unnecessary data. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>

jacob-keller · 2024-11-20T23:59:12Z

@stephenfin I lost track of this, but ran into the problem again today. I refactored this and re-wrote to avoid using the requests library. Now I do some basic parsing of the link header and simply recursively call _list to get all the items.

This is a little slow for busy servers, but I think it is preferable to get all the data than to silently truncate to 30 results. Perhaps we could change the default page size to improve performance, or make that a configurable option.

It might be worth handling -n and -N in such a way that we skip past pages rather than downloading them only to throw away the result.

jacob-keller force-pushed the add-pagination-support-to-list branch from 4f1adaa to d118865 Compare November 20, 2024 23:56

jacob-keller marked this pull request as ready for review November 20, 2024 23:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add pagination support to list #26

add pagination support to list #26

jacob-keller commented Oct 27, 2023

stephenfin commented Nov 17, 2023

stephenfin commented Nov 17, 2023

jacob-keller commented Nov 17, 2023

jacob-keller commented Nov 20, 2024

add pagination support to list #26

Are you sure you want to change the base?

add pagination support to list #26

Conversation

jacob-keller commented Oct 27, 2023

stephenfin commented Nov 17, 2023

stephenfin commented Nov 17, 2023

jacob-keller commented Nov 17, 2023

jacob-keller commented Nov 20, 2024