Skip to content

[discussion] Improving Jest performance with a custom resolver #11034

@lencioni

Description

@lencioni

I work in a fairly large codebase (~80,000 TypeScript files plus ~250,000 files in node_modules), and I recently improved a performance issue we were seeing with Jest and wanted to share my findings and code somewhere that others may find useful. Please feel free to close this issue.

We are currently using Jest features like --findRelatedTests or --changedSince in some places in our CI setup. I wanted to see the list of test files, so I tried this out locally with the --listTests flag and noticed that it was very slow (~150 seconds).

Here's a CPU profile I took in devtools:

image

Zooming in, that very large block all looks basically just like this:

image

I discovered that almost all of the time was spent in the loadpkg function of the resolve package, which Jest uses as part of its default resolver: https://github.com/facebook/jest/blob/baf9f9937720e87d2b2bd09f4c053fa4f16424ec/packages/jest-resolve/src/defaultResolver.ts#L44-L54

image

Unfortunately, the resolve package is not very efficient for this. This package will attempt to look for package.json files in every directory, and repeatedly read and parse package.json files, for every file that it needs to resolve: https://github.com/browserify/resolve/blob/4bece07740878577be9570efe47fde66d289b5ff/lib/sync.js#L123-L147

We have some guarantees in our repo, e.g. we only have one package.json file outside of node_modules and we don't rely on any of its fields for module resolution, and we also don't expect any of these files to change within a single run. Therefore, we can make this a lot more efficient by avoiding the resolve package for files outside of node_modules (i.e. doing a custom resolution) and caching the results. Since we have ~80,000 TypeScript files in our repo, and only one package.json file that happens to be irrelevant to how these are resolved, this means we might be reading that same package.json file off disk and calling JSON.parse on its contents ~80,000 times more than is actually necessary.

To ensure that the caching doesn't make watch mode confusing when the file system changes while watch mode is running, I added a watch plugin to clear the caches when the filesystem changes. This replicates the default resolver's integration with watch mode: https://github.com/facebook/jest/blob/7edfb105f3f377434d30e143a4dbcc86e541b361/packages/jest-core/src/watch.ts#L288-L289

This resolver reduces my original --listTests --findRelatedTests command from ~150s to ~15s.

Here's my code. Most of the speed improvement is due to avoiding the resolve package for our files outside of node_modules, with the additional full resolution caching adding a little bit of speed improvement on top.

// custom-resolver.js
const fs = require('graceful-fs');
const path = require('path');

const IPathType = {
  FILE: 1,
  DIRECTORY: 2,
  OTHER: 3,
};

const checkedPaths = new Map();

function statSyncCached(filePath) {
  const result = checkedPaths.get(filePath);
  if (result !== undefined) {
    return result;
  }

  let stat;
  try {
    stat = fs.statSync(filePath);
  } catch (e) {
    if (!(e && (e.code === 'ENOENT' || e.code === 'ENOTDIR'))) {
      throw e;
    }
  }

  if (stat) {
    if (stat.isFile() || stat.isFIFO()) {
      checkedPaths.set(filePath, IPathType.FILE);
      return IPathType.FILE;
    }

    if (stat.isDirectory()) {
      checkedPaths.set(filePath, IPathType.DIRECTORY);
      return IPathType.DIRECTORY;
    }
  }

  checkedPaths.set(filePath, IPathType.OTHER);
  return IPathType.OTHER;
}

function isFile(filePath) {
  return statSyncCached(filePath) === IPathType.FILE;
}

function isDirectory(dir) {
  return statSyncCached(dir) === IPathType.DIRECTORY;
}

const resolverCache = new Map();

const defaultModuleDirectory = ['node_modules'];

// Jest's default resolver uses the `resolve` npm package, which is woefully
// inefficient. The resolve package will attempt to look for package.json files
// in every directory, and repeatedly read and parse package.json files, for
// every file that it needs to resolve. Since we have some guarantees here, like
// we only have one package.json file for our repo, and we don't expect these
// files to change within a single run, we can make this a lot more efficient by
// doing a custom resolution and caching the results.
module.exports = function customResolver(request, options) {
  const isRequestAbs = request.startsWith('/');

  const cacheKey = isRequestAbs
    ? // If the request is an absolute path, we don't need to include the
      // basedir in the cache key, since it will not affect the resolution
      request
    : `${options.basedir};${request}`;

  const resolverCachedResult = resolverCache.get(cacheKey);
  if (resolverCachedResult !== undefined) {
    return resolverCachedResult;
  }

  const moduleDirectory = options.moduleDirectory || defaultModuleDirectory;
  const isInModuleDirectory = moduleDirectory.some((dir) =>
    isRequestAbs
      ? // If the request is an absolute path, we don't need to check the
        // basedir
        request.includes(`/${dir}/`)
      : options.basedir.includes(`/${dir}/`) || request.includes(`/${dir}/`),
  );

  // Local request paths can be of the form like "../foo" or
  // "/Users/foo/repo/bar", so we need to make sure that all of our bases
  // are covered.
  const isLocalFile = !isInModuleDirectory && (request.startsWith('.') || isRequestAbs);

  if (isLocalFile) {
    // This is a local file, so we want to do a custom fast resolution.

    const absPath = isRequestAbs ? request : path.resolve(options.basedir, request);

    if (isFile(absPath)) {
      // Path exists and is a file, so we are done here
      resolverCache.set(cacheKey, absPath);
      return absPath;
    }

    // If the file was not resolved using the extensions, and the absolute path is a directory,
    // we also want to check the index in the directory.
    const pathsToCheck = isDirectory(absPath) ? [absPath, `${absPath}/index`] : [absPath];

    // Prevent resolve from trying to look up, read, and parse the package.json
    // for this file, to improve performance.
    for (let i = 0; i < pathsToCheck.length; i++) {
      for (let j = 0; j < options.extensions.length; j++) {
        const resolvedPathWithExtension = `${pathsToCheck[i]}${options.extensions[j]}`;

        if (isFile(resolvedPathWithExtension)) {
          resolverCache.set(cacheKey, resolvedPathWithExtension);
          return resolvedPathWithExtension;
        }
      }
    }

    throw new Error(`Could not resolve module ${request} from ${options.basedir}`);
  }

  const defaultResolverResult = options.defaultResolver(request, options);
  resolverCache.set(cacheKey, defaultResolverResult);
  return defaultResolverResult;
};

module.exports.clearCache = function clearCache() {
  checkedPaths.clear();
  resolverCache.clear();
};
// custom-resolver-watch-plugin.js
const { clearCache } = require('./custom-resolver');

module.exports = class CustomResolverWatchPlugin {
  apply(jestHooks) {
    let isFirstRun = true;

    jestHooks.onFileChange(() => {
      // Clear the resolver cache whenever the filesystem changes so that files
      // will be correctly resolved again.

      if (isFirstRun) {
        // This is triggered when the watcher first starts up, but we don't
        // actually need to clear the cache at this time. We can get a small
        // speed benefit by skipping this on the first run.
        isFirstRun = false;
      } else {
        clearCache();
      }
    });
  }
};

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions