Skip to content

Commit c22ae04

Browse files
authored
Merge pull request #3125 from github/cklin/overlay-restore-timeout
Overlay: use restoreCache() timeout
2 parents 12dda79 + 80273e2 commit c22ae04

File tree

3 files changed

+70
-5
lines changed

3 files changed

+70
-5
lines changed

lib/analyze-action.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

lib/init-action.js

Lines changed: 32 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/overlay-database-utils.ts

Lines changed: 37 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,12 @@ function computeChangedFiles(
158158
// Constants for database caching
159159
const CACHE_VERSION = 1;
160160
const CACHE_PREFIX = "codeql-overlay-base-database";
161-
const MAX_CACHE_OPERATION_MS = 120_000; // Two minutes
161+
162+
// The purpose of this ten-minute limit is to guard against the possibility
163+
// that the cache service is unresponsive, which would otherwise cause the
164+
// entire action to hang. Normally we expect cache operations to complete
165+
// within two minutes.
166+
const MAX_CACHE_OPERATION_MS = 600_000;
162167

163168
/**
164169
* Checks that the overlay-base database is valid by checking for the
@@ -351,8 +356,38 @@ export async function downloadOverlayBaseDatabaseFromCache(
351356
try {
352357
const databaseDownloadStart = performance.now();
353358
const foundKey = await waitForResultWithTimeLimit(
359+
// This ten-minute limit for the cache restore operation is mainly to
360+
// guard against the possibility that the cache service is unresponsive
361+
// and hangs outside the data download.
362+
//
363+
// Data download (which is normally the most time-consuming part of the
364+
// restore operation) should not run long enough to hit this limit. Even
365+
// for an extremely large 10GB database, at a download speed of 40MB/s
366+
// (see below), the download should complete within five minutes. If we
367+
// do hit this limit, there are likely more serious problems other than
368+
// mere slow download speed.
369+
//
370+
// This is important because we don't want any ongoing file operations
371+
// on the database directory when we do hit this limit. Hitting this
372+
// time limit takes us to a fallback path where we re-initialize the
373+
// database from scratch at dbLocation, and having the cache restore
374+
// operation continue to write into dbLocation in the background would
375+
// really mess things up. We want to hit this limit only in the case
376+
// of a hung cache service, not just slow download speed.
354377
MAX_CACHE_OPERATION_MS,
355-
actionsCache.restoreCache([dbLocation], cacheRestoreKeyPrefix),
378+
actionsCache.restoreCache(
379+
[dbLocation],
380+
cacheRestoreKeyPrefix,
381+
undefined,
382+
{
383+
// Azure SDK download (which is the default) uses 128MB segments; see
384+
// https://github.com/actions/toolkit/blob/main/packages/cache/README.md.
385+
// Setting segmentTimeoutInMs to 3000 translates to segment download
386+
// speed of about 40 MB/s, which should be achievable unless the
387+
// download is unreliable (in which case we do want to abort).
388+
segmentTimeoutInMs: 3000,
389+
},
390+
),
356391
() => {
357392
logger.info("Timed out downloading overlay-base database from cache");
358393
},

0 commit comments

Comments
 (0)