What happened?
The Replicate handler gracefully handles the interim processing state, but does not handle the starting state, throwing an exception.
The current completion() and async_completion() is this:
if ( response.status_code == 200 and response.json().get("status") == "processing" ): continue
A more accurate implementation would be:
if response.status_code == 200 and response.json().get("status") not in [ "succeeded", "failed", "canceled", ]: continue
I implemented these changes in a fork and verified they work in our IBM Granite notebooks (e.g., this notebook)
cc: @bjhargrave (IBM), @zeke (Replicate), @aron (Replicate)
Relevant log output
Are you a ML Ops Team?
No
What LiteLLM version are you on ?
v1.79.1-stable-patch-1
Twitter / LinkedIn details
No response