Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(Node.js): Major performance bottleneck in Node.js due to fs.statSync #1477

Closed
maxpatiiuk opened this issue Dec 1, 2023 · 5 comments · Fixed by #1478
Closed

perf(Node.js): Major performance bottleneck in Node.js due to fs.statSync #1477

maxpatiiuk opened this issue Dec 1, 2023 · 5 comments · Fixed by #1478
Assignees

Comments

@maxpatiiuk
Copy link

maxpatiiuk commented Dec 1, 2023

Describe the bug

Version: 20.0.0

ts-morph calls RealFileSystemHost.fileExistsSync and RealFileSystemHost.directoryExistsSync very often.

The current implementation of those functions relies on calling fs.statSync and try/catching the error if file doesn't exist.

However, due to the current nature of Node.js (nodejs/performance#40, nodejs/performance#39), creating exceptions is very expensive.

In my case, the codemod takes a total of 40s to run, of which 22s was spent in fs.statSync creating exceptions:

Screenshot 2023-12-01 at 08 24 32

Solutions:

  • Node.js has fs.existsSync() function that allows for checking for file existence without throwing exceptions
  • Node.js's fs.statSync() has a throwIfNoEntry argument (fs.statSync(file, { throwIfNoEntry: false });), which makes it return undefined rather than throw when file does not exist.

of course, the situation is a bit more complicated by the fact that node.js file system is abstracted away behind an adapter in ts-morph, and adapter has to be as small as possible. As a solution, I would suggest adding existsSync to the adapter, as an optional function (for backwards compatibility), and implementing it using node's existsSync - but in cases when custom adapter was used and fs.existsSync wasn't provided, fallback to current behavior with fs.statSync.


There is an excellent article on Node.js performance that describes the exact issue ts-morph is experiencing - https://marvinh.dev/blog/speeding-up-javascript-ecosystem-part-2/#:~:text=The%20cost%20of%20capturing%20stack%20traces


For node.js projects, until this bug is fixed, the workaround looks like this:

import { Project } from "ts-morph";
import { existsSync, statSync } from "node:fs";

const project = new Project();
const fileSystem = project.getFileSystem();
fileSystem.fileExistsSync = existsSync;
fileSystem.directoryExistsSync = (file) => {
  const stat = statSync(file, { throwIfNoEntry: false });
  return stat?.isDirectory() ?? false;
};

To Reproduce

By the nature of the performance issue, it requires a larger codebase to reproduce noticeably.

import { Project } from "ts-morph";

const project = new Project();
const sourceFile = project.addSourceFileAtPath("someFileWithImportsAndTypeScriptErrors.ts");
// This function in particular causes a lot of fs.existsSync calls:
sourceFile.getPreEmitDiagnostics();

Expected behavior

Checking for file existence should not be a major bottleneck.

@maxpatiiuk
Copy link
Author

Less critical, but out of 40s, 2s was spent in this function, thus if possible it should be optimized better too:

/**
* Gets the standardized absolute path.
* @param fileSystem - File system.
* @param fileOrDirPath - Path to standardize.
* @param relativeBase - Base path to be relative from.
*/
static getStandardizedAbsolutePath(fileSystem: FileSystemHost, fileOrDirPath: string, relativeBase?: string): StandardizedFilePath {
return FileUtils.standardizeSlashes(path.normalize(getAbsolutePath())) as StandardizedFilePath;
function getAbsolutePath() {
if (isAbsolutePath(fileOrDirPath))
return fileOrDirPath;
if (!fileOrDirPath.startsWith("./") && relativeBase != null)
return path.join(relativeBase, fileOrDirPath);
return path.join(fileSystem.getCurrentDirectory(), fileOrDirPath);
}
}

The fact that this function creates another function inside of it is a possible culprit

Screenshot 2023-12-01 at 09 11 04

@maxpatiiuk
Copy link
Author

For reference, this is how I captured and inspected the performance:

# I am running Node.js like this:
node --cpu-prof --cpu-prof-dir ~/Downloads/node --no-warnings=ExperimentalWarning --loader @swc-node/register/esm ./myScript.ts
# But if you are dealing with .js file, this would work too:
node --cpu-prof --cpu-prof-dir ~/Downloads/node ./myScript.js
# (The performance issue is independent of using a TypeScript loader)

That creates several .cpuprofile files in the ~/Downloads/node directory. Pick the largest among those, and open it in https://www.speedscope.app/. In that tool, use the "Sanditch" view, and sort by "Self" time to see functions which are good targets for further optimization.

@dsherret
Copy link
Owner

dsherret commented Dec 2, 2023

@maxxxxxdlp thanks for benchmarking it! Feel free to submit a PR. I don't have time to spend on performance optimizations unfortunately (though I'll probably get to this eventually).

@dsherret
Copy link
Owner

dsherret commented Dec 2, 2023

Actually, I'm going to consolidate the deno code to use the node code and in the meantime fix this perf issue.

@maxpatiiuk
Copy link
Author

Can confirm the performance issue with statSync/captureLargerStackTrace is resolved in the most recent release. Thanks for the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants