Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve polling logic #58

Open
jtran opened this issue Oct 31, 2023 · 0 comments
Open

Improve polling logic #58

jtran opened this issue Oct 31, 2023 · 0 comments

Comments

@jtran
Copy link
Contributor

jtran commented Oct 31, 2023

Split out from: #51 (comment)

Here's the pseduocode I was thinking. It's roughly how the JS client works.

It relies on the rate limiter. I think that an even smarter version would mirror the backend's queue so that it doesn't bother polling for files that are likely waiting in the backend's queue. But we need to be careful if we do that.

# Initial fetch to get jobs started on the server.
for file in files:
    attempt_fetch(file)

# Refetch in order.
while True:
    try:
        next_time, file = priority_queue.get()
    except Empty:
        break
    # Wait until the next fetch.
    current_time = seconds_since_epoch()
    delta_time = next_time - current_time
    if delta_time > 0:
        time.sleep(delta_time)

    attempt_fetch(file)

if failed_files:
    raise Exception('Not all files finished')

The helper code looks something like this. Since we already use RateLimitedSession, jitter to prevent thundering herd shouldn't be needed, but it's something to consider.

@dataclass
class File:
    path: str
    attempt_count: int

priority_queue = queue.PriorityQueue()
failed_files = []

def attempt_fetch(file):
    file.attempt_count += 1
    try:
        result = repository.get_generated_json(file.path)
    except NotYetGeneratedException:
        if file.attempt_count < MAX_ATTEMPTS
            # Since RateLimitedSession sleeps, an arbitrary amount of time may
            # of gone by since we called it.
            now = seconds_since_epoch()
            # Retry later.
            priority_queue.put((now + SINGLE_FILE_DELAY, file))
        else:
            # Failed for good.
            failed_files.append(file)
    else:
        all_components.exend(result)

Python's built-in priority queue is synchronized, so it's less efficient than we could do. But the vast majority of our time will be sleeping or doing I/O, so it doesn't really matter.

The next step to improving this, in my opinion, would be making attempt_fetch() make the request asynchronously and return immediately without blocking. That would allow us to send off many requests simultaneously, without requiring the overall structure of this code to change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant