Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add pagination support to list #26

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
handle the pagination Link headers for _list requests
The patchwork REST API defaults to sending a maximum of 30 items for API
requests which return a list. If the API request has more than 30 items,
then pwclient will only ever see the first page, and confuse users who
expect to see the entire set of items.

To handle this, the API includes 'Link' headers in the response which
indicate whether there is more data, and if so, what URL the data is
available at.

Add a method to extract the page number query parameter from the Link
header URL if it exists. Use this to recursively call _list() until there
is no more next page to obtain.

Implement extraction of the page number as a new static method which deals
with the complexity of analyzing the link headers. We only check for the
first link header, and only the first 'rel="next"' portion of it. Split
the discovered URL using urllib.parse.urlparse to find the query section
and locate the page=<value> bit.

This method avoids needing to add a new dependency (such as the requests
library), and should work for valid Link headers provided by patchwork. At
any point, if we fail to find the expected data, link parsing is stopped
and we return the set of items we've found so far.

This should fix pwclient to properly handle multiple pages of data. This
will likely cause some invocations of pwclient to slow down as they must
now query every available page. This is preferable to not returning the
available data.

A future improvement to reduce this would be to extend -n and -N to work
with pages so that we avoid downloading unnecessary data.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
  • Loading branch information
jacob-keller committed Nov 20, 2024
commit d118865dba3e8e93bf75592ad9199937a27b9b93
41 changes: 39 additions & 2 deletions pwclient/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -581,6 +581,28 @@ def _detail(
data, _ = self._get(url)
return json.loads(data)

@staticmethod
def _get_next_page(headers):
link_header = next((data for header, data in headers if header == 'Link'), None)
if link_header is None:
return None

rel = '; rel="next"'

url = next((l[:-len(rel)] for l in link_header.split(',') if l.endswith(rel)), None)
if url is None:
return None;

if not (url.startswith('<') and url.endswith('>')):
return None;

parsed_link = urllib.parse.urlparse(url[1:-1])
page = next((x for x in parsed_link.query.split('&') if x.startswith('page=')), None)
if page is None:
return None

return int(page[5:])

def _list(
self,
resource_type,
Expand All @@ -594,8 +616,23 @@ def _list(
url = f'{url}{resource_id}/{subresource_type}/'
if params:
url = f'{url}?{urllib.parse.urlencode(params)}'
data, _ = self._get(url)
return json.loads(data)
data, headers = self._get(url)

items = json.loads(data)

page_nr = self._get_next_page(headers)
if page_nr is None:
return items

if params is None:
params = {}
params['page'] = page_nr

items += self._list(resource_type, params,
resource_id=resource_id,
subresource_type=subresource_type)

return items

# project

Expand Down