Environment details
- Programming language: TypeScript / Node (project uses Bun, action runs on Node)
- OS: GitHub Runner
ubuntu-latest
- Language runtime version: Node 20 (action's runtime)
- Package version:
googleapis/release-please-action@v4. Also reproducible on @v5. Underlying release-please 17.3.0 and 17.6.0.
Steps to reproduce
- Have a workflow that runs
release-please-action on push to main, with a downstream deploy job gated on needs.release-please.outputs['<path>--release_created'] == 'true'.
- Merge a release-please PR (e.g.
chore(main): release X.Y.Z).
- While the workflow runs, encounter a transient GraphQL failure during pass-2 (
createPullRequests's commit.history query). The easiest natural reproduction is running during a GitHub Disruption with some GitHub services incident. We caught this on 2026-04-27 16:48Z to 19:02Z. Symptom in logs:
Creating 1 releases for pull #41 (pass 1 succeeded, tag v1.10.1 cut)
Fetching merge commits on branch main with cursor: undefined (pass 2 starts)
##[error]release-please failed: Request failed due to following response errors:
- Something went wrong while executing your query on 2026-04-27T18:46:36Z.
Please include `2401:3F4A02:1120138:44A0780:69EFAF0C` when reporting...
- Observe: the GitHub release and tag for X.Y.Z are created (pass-1 completed), but the
release-please job's conclusion is failure, the downstream deploy job is skipped, and re-running the workflow does not recover. The action no longer re-emits release_created=true on subsequent runs because the release already exists.
What's wrong
release-please-action's main() (verified byte-identical in v4 and v5) runs two passes in one step:
manifest.createReleases() cuts the release and writes <path>--release_created=true via core.setOutput.
manifest.createPullRequests() scans commit.history to draft the next release PR. Independent of the release just cut.
If pass-2 throws, core.setFailed marks the entire step failed. The job's release_created output is no longer trusted by needs: consumers, even though pass-1 wrote it before pass-2 ran. Once a tag exists, no future workflow run will re-emit it. So the deploy gate becomes permanently unreachable for that release.
Same symptom independently reported in #1202 today (Rust project, v5), confirming this is neither language- nor repo-specific. Recurring reports: #867 (2023), #976, #977 (2024), #1202 (2026). Each prior issue has been closed without addressing the coupling.
Three independent root issues
(A) Retry only catches HTTP 502. From release-please/src/github.ts graphqlRequest:
if ((err as GitHubAPIError).status !== 502) {
throw err;
}
The 2026-04-27 incident returned a GraphqlResponseError (HTTP 200, errors: [{message: "Something went wrong..."}] in body). No 502 means no retry. Retry should also cover 503, 504, and GraphqlResponseError whose messages match /something went wrong|server error/i. Identical in 17.3.0 and 17.6.0.
(B) Pass-1 and pass-2 share a single failure surface. Wrapping pass-2 in its own try/catch and surfacing pass-2 errors as core.warning (rather than core.setFailed) would let pass-1's outputs survive a transient pass-2 failure.
(C) Pass-2 is unconditional even when pointless. When HEAD is the release commit pass-1 just cut, pass-2 has zero new commits to scan and zero PRs to open. A short-circuit (if (headSha === justCutReleaseSha) return;) would skip the API call entirely for the most common trigger and eliminate this failure mode there.
Workaround for affected users
continue-on-error: true on the action step preserves pass-1's outputs even if pass-2 throws. Pass-1 outputs are written via core.setOutput before pass-2 begins (verified in src/index.ts main()), so downstream gates on <path>--release_created remain accurate.
Confirmed v5 does not fix this: src/index.ts main() is byte-identical between v4 and v5. The v5.0.0 release notes are limited to a Node 24 bump and a release-please lib bump from 17.3.0 to 17.6.0, neither of which changes retry coverage or pass coupling.
Asks (in priority order)
- Decouple pass-2 failures from pass-1 outputs at the action level (issue B). Easiest, fully within this repo.
- Broaden retry coverage in
release-please to include 503, 504, and matching GraphqlResponseError (issue A).
- Short-circuit pass-2 when no work is possible (issue C).
Environment details
ubuntu-latestgoogleapis/release-please-action@v4. Also reproducible on@v5. Underlyingrelease-please17.3.0 and 17.6.0.Steps to reproduce
release-please-actionon push tomain, with a downstreamdeployjob gated onneeds.release-please.outputs['<path>--release_created'] == 'true'.chore(main): release X.Y.Z).createPullRequests'scommit.historyquery). The easiest natural reproduction is running during a GitHubDisruption with some GitHub servicesincident. We caught this on 2026-04-27 16:48Z to 19:02Z. Symptom in logs:release-pleasejob's conclusion isfailure, the downstreamdeployjob isskipped, and re-running the workflow does not recover. The action no longer re-emitsrelease_created=trueon subsequent runs because the release already exists.What's wrong
release-please-action'smain()(verified byte-identical in v4 and v5) runs two passes in one step:manifest.createReleases()cuts the release and writes<path>--release_created=trueviacore.setOutput.manifest.createPullRequests()scanscommit.historyto draft the next release PR. Independent of the release just cut.If pass-2 throws,
core.setFailedmarks the entire step failed. The job'srelease_createdoutput is no longer trusted byneeds:consumers, even though pass-1 wrote it before pass-2 ran. Once a tag exists, no future workflow run will re-emit it. So the deploy gate becomes permanently unreachable for that release.Same symptom independently reported in #1202 today (Rust project, v5), confirming this is neither language- nor repo-specific. Recurring reports: #867 (2023), #976, #977 (2024), #1202 (2026). Each prior issue has been closed without addressing the coupling.
Three independent root issues
(A) Retry only catches HTTP 502. From
release-please/src/github.tsgraphqlRequest:The 2026-04-27 incident returned a
GraphqlResponseError(HTTP 200,errors: [{message: "Something went wrong..."}]in body). No 502 means no retry. Retry should also cover 503, 504, andGraphqlResponseErrorwhose messages match/something went wrong|server error/i. Identical in 17.3.0 and 17.6.0.(B) Pass-1 and pass-2 share a single failure surface. Wrapping pass-2 in its own try/catch and surfacing pass-2 errors as
core.warning(rather thancore.setFailed) would let pass-1's outputs survive a transient pass-2 failure.(C) Pass-2 is unconditional even when pointless. When HEAD is the release commit pass-1 just cut, pass-2 has zero new commits to scan and zero PRs to open. A short-circuit (
if (headSha === justCutReleaseSha) return;) would skip the API call entirely for the most common trigger and eliminate this failure mode there.Workaround for affected users
continue-on-error: trueon the action step preserves pass-1's outputs even if pass-2 throws. Pass-1 outputs are written viacore.setOutputbefore pass-2 begins (verified insrc/index.tsmain()), so downstream gates on<path>--release_createdremain accurate.Confirmed v5 does not fix this:
src/index.tsmain()is byte-identical between v4 and v5. The v5.0.0 release notes are limited to a Node 24 bump and arelease-pleaselib bump from 17.3.0 to 17.6.0, neither of which changes retry coverage or pass coupling.Asks (in priority order)
release-pleaseto include 503, 504, and matching GraphqlResponseError (issue A).