Skip to content

Git fetcher should fall back to clone on TAR_BAD_ARCHIVE, not just HTTP errors #476

@babyhuey

Description

@babyhuey

Summary

When installing a gitlab: shorthand git dependency from a private GitLab repo, pacote's git fetcher tries the hosted tarball URL first. For private repos, GitLab redirects unauthenticated requests to /users/sign_in with HTTP 302 → 200. Since the response is HTTP 200 (not an error), npm's HTTP client treats it as a successful download. The tar extractor then tries to parse the HTML sign-in page as a tarball and fails with TAR_BAD_ARCHIVE.

The fallback to git clone in lib/git.js only triggers on HTTP errors:

```javascript
// lib/git.js ~line 258
}).extract(tmp).then(() => handler(...), er => {
// fall back to ssh download if tarball fails
if (er.constructor.name.match(/^Http/)) {
return this.#clone(handler, false)
} else {
throw er
}
})
```

TAR_BAD_ARCHIVE is not an HTTP error, so it throws instead of falling back to clone. A one-line fix resolves the issue:

```diff

  • if (er.constructor.name.match(/^Http/)) {
  • if (er.constructor.name.match(/^Http/) || er.code === 'TAR_BAD_ARCHIVE') {
    ```

Steps to reproduce

  1. Have a private repo on GitLab (e.g., gitlab:myorg/my-private-pkg#1.0.0)
  2. Ensure no GitLab HTTPS auth is configured (only SSH)
  3. Run npm install gitlab:myorg/my-private-pkg#1.0.0

What happens

```
npm warn tar TAR_ENTRY_INVALID checksum failure
npm warn tar TAR_BAD_ARCHIVE: Unrecognized archive format
npm error code TAR_BAD_ARCHIVE
npm error TAR_BAD_ARCHIVE: Unrecognized archive format
```

Debug logs show the tar parser receiving HTML (<!DOCTYPE html> / GitLab sign-in page) instead of a tarball archive. The HTTP request to https://gitlab.com/{user}/{project}/repository/archive.tar.gz?ref={tag} gets a 302 redirect to /users/sign_in, which returns 200 with HTML.

What should happen

pacote should fall back to git clone (like it does for HTTP errors) when the tarball extraction fails, since tar errors after a "successful" HTTP download indicate the response wasn't actually a tarball.

Environment

  • npm 11.11.0 (Node 24.14.1) — also reproducible on npm 10.x / Node 20, Node 22
  • pacote version: bundled with npm
  • hosted-git-info 9.0.2
  • GitLab.com (private repos)

Related

This likely affects all hosted git providers that return 200 with HTML for unauthenticated archive requests instead of a proper HTTP error status.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions