Skip to content

[Docs] Add/improve error code documentation of thrown GenkitErrors + retry handling/strategy #3029

Open
@Tyg-g

Description

@Tyg-g

Is your report related to a problem? Please describe.

I'm trying to determine, which errors should be automatically retried.

My processing flow is quite complicated so

  1. ✅ I want to retry everything locally if possible, so any previous steps are not rerun, when not needed, however,
  2. ❌ For errors, for which retry doesn't make sense (validation error, unauthorized, invalid parameter), I want to skip retries — otherwise it could rerun unnecessary previous steps, and would waste resources.

Is your report related to a suggestion/improvement? Please describe.

💡 Yes — I'd like to kindly ask the devs to improve the error documentation, with clear and structured explanations. I think this is quite a common scenario when implementing more complex workflows.

I'll break down my suggestion for the 2 major error types involved: GenerationResponseError and GenerationResponseError.


Additional context


⚠️ GenerationResponseError

Please explain the status codes in detail.

Some of them seem straightforward:

  • "UNAUTHENTICATED" - probably 401 Unauthorized
  • "PERMISSION_DENIED" - probably 403 Forbidden
  • "RESOURCE_EXHAUSTED" - probably 429 Too Many Requests

🔁 Are these 1:1 mappings to HTTP status codes? Or are these statuses used in other cases?

Other codes are more ambiguous:

  • "OUT_OF_RANGE" - is it for

    • 400 Bad Request (for an incorrect parameter), or
    • 416 Range Not Satisfiable (bad offser when downloading a resource), or
    • is it for some other specific case?
  • "INTERNAL" - is this for

    • 500 Internal Server Error on the LLM backend,
    • or is it a local/internal failure in the genkit module?

💭 Overall: More clarity would help with programmatic error handling and retry logic.


🧪 ValidationError

It would be important to distinguish if a validation failure happened in an input schema or an output schema (both for prompts and flows):

  • 🟩 Output validation error (e.g. malformed LLM response)
    ➤ This is likely a random erroneous output from the LLM → should be retried.
  • 🟥 Input validation error (e.g. invalid prompt or parameters)
    ➤ This usually signals a bug → should not be retried, but should crash early.

🧵 Final Thought

Yes, I know that the error objects contain human-readable details and messages. But that doesn't help with programmatic decision-making — especially because the more specific info (detail) has no specified structure.

📌 I think structured, documented guidance on what different errors mean and how to handle them (retry, abort, etc.) would massively improve DX and stability.

Thanks for the help. 🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsImprovements or additions to documentation

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions