You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here's the pseduocode I was thinking. It's roughly how the JS client works.
It relies on the rate limiter. I think that an even smarter version would mirror the backend's queue so that it doesn't bother polling for files that are likely waiting in the backend's queue. But we need to be careful if we do that.
# Initial fetch to get jobs started on the server.forfileinfiles:
attempt_fetch(file)
# Refetch in order.whileTrue:
try:
next_time, file=priority_queue.get()
exceptEmpty:
break# Wait until the next fetch.current_time=seconds_since_epoch()
delta_time=next_time-current_timeifdelta_time>0:
time.sleep(delta_time)
attempt_fetch(file)
iffailed_files:
raiseException('Not all files finished')
The helper code looks something like this. Since we already use RateLimitedSession, jitter to prevent thundering herd shouldn't be needed, but it's something to consider.
@dataclassclassFile:
path: strattempt_count: intpriority_queue=queue.PriorityQueue()
failed_files= []
defattempt_fetch(file):
file.attempt_count+=1try:
result=repository.get_generated_json(file.path)
exceptNotYetGeneratedException:
iffile.attempt_count<MAX_ATTEMPTS# Since RateLimitedSession sleeps, an arbitrary amount of time may# of gone by since we called it.now=seconds_since_epoch()
# Retry later.priority_queue.put((now+SINGLE_FILE_DELAY, file))
else:
# Failed for good.failed_files.append(file)
else:
all_components.exend(result)
Python's built-in priority queue is synchronized, so it's less efficient than we could do. But the vast majority of our time will be sleeping or doing I/O, so it doesn't really matter.
The next step to improving this, in my opinion, would be making attempt_fetch() make the request asynchronously and return immediately without blocking. That would allow us to send off many requests simultaneously, without requiring the overall structure of this code to change.
The text was updated successfully, but these errors were encountered:
Split out from: #51 (comment)
Here's the pseduocode I was thinking. It's roughly how the JS client works.
It relies on the rate limiter. I think that an even smarter version would mirror the backend's queue so that it doesn't bother polling for files that are likely waiting in the backend's queue. But we need to be careful if we do that.
The helper code looks something like this. Since we already use
RateLimitedSession
, jitter to prevent thundering herd shouldn't be needed, but it's something to consider.Python's built-in priority queue is synchronized, so it's less efficient than we could do. But the vast majority of our time will be sleeping or doing I/O, so it doesn't really matter.
The next step to improving this, in my opinion, would be making
attempt_fetch()
make the request asynchronously and return immediately without blocking. That would allow us to send off many requests simultaneously, without requiring the overall structure of this code to change.The text was updated successfully, but these errors were encountered: