Open
Description
In internal issue 195911158, a customer is struggling to retry jobs that fail with "403 Exceeded rate limits: too many table update operations for this table". One can encounter this exception by attempting to run hundreds of load jobs in parallel.
Thoughts:
- Try to reproduce. Does the exception happen at
result()
orload_table_from_uri()
? Ifresult()
, continue withjob_retry
, otherwise see if we can modify the default retry predicate forload_table_from_uri()
to find this rate limiting reason and retry. - Assuming the exception does happen at
result()
, modify load jobs (or more likely the base class) to retry if job retry is set, similar to what we do for query jobs.
Notes:
- I suspect we'll need a different default
job_retry
object forload_table_from_uri()
, as the retryable reasons will likely be different than what we have for queries. - I don't think the other
load_table_from_*
are as retryable asload_table_from_uri()
, since they would require rewinding file objects, which isn't always possible. We'll probably want to consider addingjob_retry
to those load job methods in the future, but for nowload_table_from_uri
is what's needed.