Skip to content

Comments

fix: graduate PET configure timeout to avoid killing process prematurely (Fixes #1274)#1275

Merged
karthiknadig merged 4 commits intomainfrom
bug/issue-1274
Feb 25, 2026
Merged

fix: graduate PET configure timeout to avoid killing process prematurely (Fixes #1274)#1275
karthiknadig merged 4 commits intomainfrom
bug/issue-1274

Conversation

@karthiknadig
Copy link
Member

@karthiknadig karthiknadig commented Feb 25, 2026

Fix PET configure timeout killing the process prematurely, which prevents recovery on large workspaces.

Changes

  • Introduce RpcTimeoutError class with a method property for type-safe timeout detection instead of fragile string matching
  • Graduate configure timeouts with exponential backoff on timeout duration: 1st attempt 30s, retry 60s, then kill — giving PET up to 90s total before restart
  • Move lastConfiguration cache write to after successful response — previously a failed configure would cache the options, causing subsequent calls to skip the configure request entirely
  • Clear lastConfiguration on any configure failure to ensure retries are not skipped by configurationEquals
  • Prevent double-kill: resolve() and doRefresh() no longer kill the process on configure timeouts (already handled inside configure())
  • Reset configureTimeoutCount on successful configure and on process restart

Root Cause

When PET takes >30s to process configuration (e.g., large filesystem with glob patterns like **/.venv in dev containers), the extension immediately killed it and restarted. PET then had to re-process the same configuration from scratch, hitting the same timeout again, burning through all 3 restart attempts and leaving PET permanently dead.

Fixes #1274
Fixes #1266

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug where the Python Environment Tools (PET) process was being killed prematurely when configure requests took longer than 30 seconds on large workspaces. The fix implements a graduated timeout approach that allows PET to continue working through the first timeout and only kills the process after two consecutive timeouts.

Changes:

  • Introduced RpcTimeoutError class for type-safe timeout detection instead of fragile string matching
  • Implemented graduated configure timeouts: first timeout allows PET to continue, second consecutive timeout kills and restarts the process
  • Moved lastConfiguration cache write to after successful response to ensure failed configures are retried
  • Prevented double-kill by checking error type in resolve() and doRefresh() methods
  • Added configureTimeoutCount reset on successful configure and process restart
Comments suppressed due to low confidence (1)

src/managers/common/nativePythonFinder.ts:631

  • The configureTimeoutCount should be reset when the configuration options change between configure attempts. Currently, if configuration A times out once, then a different configuration B times out on the next attempt, the process will be killed even though these are different configurations. Consider resetting configureTimeoutCount when sending a new configuration that differs from the one that previously timed out. This could be done by storing the configuration that timed out and comparing it with the new configuration before incrementing the count.
    private async configure() {
        // Get all extra search paths including legacy settings and new searchPaths
        const extraSearchPaths = await getAllExtraSearchPaths();

        const options: ConfigurationOptions = {
            workspaceDirectories: this.api.getPythonProjects().map((item) => item.uri.fsPath),
            environmentDirectories: extraSearchPaths,
            condaExecutable: getPythonSettingAndUntildify<string>('condaPath'),
            pipenvExecutable: getPythonSettingAndUntildify<string>('pipenvPath'),
            poetryExecutable: getPythonSettingAndUntildify<string>('poetryPath'),
            cacheDirectory: this.cacheDirectory?.fsPath,
        };
        // No need to send a configuration request if there are no changes.
        if (this.lastConfiguration && this.configurationEquals(options, this.lastConfiguration)) {
            this.outputChannel.debug('[pet] configure: No changes detected, skipping configuration update.');
            return;
        }
        this.outputChannel.info('[pet] configure: Sending configuration update:', JSON.stringify(options));
        try {
            await sendRequestWithTimeout(this.connection, 'configure', options, CONFIGURE_TIMEOUT_MS);
            // Only cache after success so failed/timed-out calls will retry
            this.lastConfiguration = options;
            this.configureTimeoutCount = 0;
        } catch (ex) {
            if (ex instanceof RpcTimeoutError) {
                this.configureTimeoutCount++;
                if (this.configureTimeoutCount >= MAX_CONFIGURE_TIMEOUTS_BEFORE_KILL) {
                    // Repeated configure timeouts suggest PET is truly hung — kill and restart
                    this.outputChannel.error(
                        `[pet] Configure timed out ${this.configureTimeoutCount} consecutive times, killing hung process for restart`,
                    );
                    this.killProcess();
                    this.processExited = true;
                    this.configureTimeoutCount = 0;
                } else {
                    // First timeout — PET may still be working. Let it continue and retry next call.
                    this.outputChannel.warn(
                        `[pet] Configure request timed out (attempt ${this.configureTimeoutCount}/${MAX_CONFIGURE_TIMEOUTS_BEFORE_KILL}), ` +
                            'will retry on next request without killing process',
                    );
                }
            } else {
                this.outputChannel.error('[pet] configure: Configuration error', ex);
            }
            throw ex;
        }
    }

@karthiknadig karthiknadig added the bug Issue identified by VS Code Team member as probable bug label Feb 25, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/managers/common/nativePythonFinder.ts:165

  • sendRequestWithTimeout() doesn't clear the setTimeout() timer when the JSON-RPC request completes successfully. If the request resolves before timeoutMs, the timer will still fire later, call cts.cancel(), and reject timeoutPromise, which can cause an unhandled promise rejection and potentially cancel a later/finished request unexpectedly. Refactor so the timeout handle is accessible outside timeoutPromise and is always clearTimeout()'d in a finally once the race settles (or otherwise ensure the timeout promise cannot reject after success).
    const cts = new CancellationTokenSource();
    const timeoutPromise = new Promise<never>((_, reject) => {
        const timer = setTimeout(() => {
            cts.cancel();
            reject(new RpcTimeoutError(method, timeoutMs));
        }, timeoutMs);
        // Clear timeout if the CancellationTokenSource is disposed
        cts.token.onCancellationRequested(() => clearTimeout(timer));
    });

    try {
        return await Promise.race([connection.sendRequest<T>(method, params, cts.token), timeoutPromise]);
    } finally {
        cts.dispose();
    }

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@karthiknadig karthiknadig enabled auto-merge (squash) February 25, 2026 04:00
@karthiknadig karthiknadig merged commit f983b64 into main Feb 25, 2026
27 of 33 checks passed
@karthiknadig karthiknadig deleted the bug/issue-1274 branch February 25, 2026 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Issue identified by VS Code Team member as probable bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PET configure timeout kills process, preventing recovery on large workspaces Extension breaks uv venv interpreter detection in dev containers

2 participants