Skip to content

Replace node-sql-parser with sqlparser-ts#61111

Open
guan404ming wants to merge 1 commit intoapache:mainfrom
guan404ming:replace-with-sqlparser-rs
Open

Replace node-sql-parser with sqlparser-ts#61111
guan404ming wants to merge 1 commit intoapache:mainfrom
guan404ming:replace-with-sqlparser-rs

Conversation

@guan404ming
Copy link
Member

@guan404ming guan404ming commented Jan 27, 2026

TL;DR

Replaced node-sql-parser with sqlparser-ts
This new one is 14x smaller, 5x faster and is a library I actively maintain.

Why

  • node-sql-parser adds a massive ~88 MB footprint to the package just to handle SQL string detection.
  • The existing implementation lacked the broad SQL dialect coverage needed for diverse user queries.

How

  • Switched to sqlparser-ts, a library I develop and maintain, which brings the power of Rust’s sqlparser from Apache DataFusion to JS/TS via WebAssembly. It offers superior performance with zero JS/TS dependencies.
  • Refactored detectLanguage.ts to use sqlparser-ts's validate() function instead of node-sql-parser's parse() for SQL detection
  • Added comprehensive unit tests for detectLanguage covering JSON, SQL, Bash, YAML and plain text detection

Benchmark

This optimization slashed the package size by 93% and boosted validation speed by 5.4x, while delivering superior SQL compatibility and maintaining stable load times compare to pure js implementation.

Metric Before After Change
Package size 87 MB 6.4 MB -93%
JS bundle 7,846 KB 5,248 KB -33%
Validate speed 70.6 μs/call 12.86 μs/call 5.48x faster
Build time ~44s ~27s -39%
Load time ~459.6ms ~463.9ms +0.94%
Compatibility fail in complex SQL in SQLite near fully support and update actively -

More detailed benchmark between sqlparser-ts and node-sql-parser --> https://github.com/guan404ming/sqlparser-ts/tree/main/benchmark

Bundle Analyzer Treemap

before

image

after

image
Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: Claude Code with Opus 4.5 following the guidelines


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added the area:UI Related to UI/UX. For Frontend Developers. label Jan 27, 2026
@guan404ming guan404ming force-pushed the replace-with-sqlparser-rs branch 3 times, most recently from fd75111 to 63e11de Compare January 27, 2026 11:01
@guan404ming
Copy link
Member Author

guan404ming commented Jan 27, 2026

Load Time Benchmark

Result

Metric main (node-sql-parser) this pr (sqlparser-ts)
Avg Page Load 459.6ms 463.9ms
Median Page Load 446.3ms 450.2ms
Min 382.0ms 359.1ms
Max 549.0ms 643.2ms
Avg DOM Content Loaded 456.4ms 460.3ms
Individual Values 549, 439, 382, 390, 433, 438, 453, 519, 469, 524 443, 452, 368, 359, 406, 449, 480, 574, 643, 467

No measurable performance difference. Both branches are within noise margin (~4ms difference in median)


Config

  • Browser: Playwright Chromium (Default)
  • Runs: 10 (with 1 warmup run excluded)
  • Wait Condition: networkidle
  • test page: TI render template tab with sql inside
image

Playwright Script

async (page) => {
  const url = 'http://localhost:28080/dags/long_log_dag/runs/manual__2026-01-27T09:58:47.311305+00:00/tasks/execute_query/rendered_templates';
  const runs = 10;
  const results = [];

  // Warm up
  await page.goto(url, { waitUntil: 'networkidle' });

  for (let i = 0; i < runs; i++) {
    await page.goto(url, { waitUntil: 'networkidle' });

    const timing = await page.evaluate(() => {
      const nav = performance.getEntriesByType('navigation')[0];
      return {
        pageLoad: nav.loadEventEnd - nav.startTime,
        domContentLoaded: nav.domContentLoadedEventEnd - nav.startTime,
      };
    });

    results.push(timing);
  }

  const avg = (arr) => (arr.reduce((a, b) => a + b, 0) / arr.length).toFixed(1);
  const median = (arr) => {
    const s = [...arr].sort((a, b) => a - b);
    const m = Math.floor(s.length / 2);
    return s.length % 2 ? s[m] : (s[m - 1] + s[m]) / 2;
  };

  const p = results.map(r => r.pageLoad);
  const d = results.map(r => r.domContentLoaded);

  return {
    runs,
    pageLoad: { 
      avg: avg(p), 
      median: median(p).toFixed(1), 
      min: Math.min(...p).toFixed(1), 
      max: Math.max(...p).toFixed(1), 
      values: p.map(v => v.toFixed(1)) 
    },
    domContentLoaded: { 
      avg: avg(d), 
      median: median(d).toFixed(1), 
      min: Math.min(...d).toFixed(1), 
      max: Math.max(...d).toFixed(1) 
    },
  };
}

Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @guan404ming for this proposal, indeed such optimization would be a good addition I believe.

I don't feel super confident adding a third party dependency freshly created and that hasn't a track record of 'working' in production. (and being popular amongst the community).

Do you know if there are more popular alternatives? If there is not what would it take to directly embed datafusion-sqlparser-rs within airflow without relying on an external package?

Just exploring our options here.

@guan404ming
Copy link
Member Author

guan404ming commented Jan 30, 2026

Thanks for the feedback! Totally valid concern regarding the new dependency.

Do you know if there are more popular alternatives?

Here is some alternatives but most of them are either support specific dialects or lack of maintenance. After this investigation, I then decide to start to build by myself.

  • node-sql-parser --> current pkg, large and have some compat issue for different dialects
  • pgsql-parser --> only support pg
  • dt-sql-parser --> build for flink / spark
  • js-sql-parser --> outdated, last release is 2 years ago
  • antlr4ts --> only provide engine. we need to built from scratch, which is high maintenance needed

If there is not what would it take to directly embed datafusion-sqlparser-rs within airflow without relying on an external package

I've also consider this option before. But since the core logic of datafusion-sqlparser-rs is written on Rust, embedding it directly would force Airflow to introduce a Rust toolchain and WASM build steps (like wasm-pack) into the CI/CD pipeline, which increases the barrier and lots code. So packaging it separately to abstracts this complexity away, keeping the Airflow focus on Python/JS is kind of better solution if we want to take speed and size advantage of the datafusion-sqlparser-rs

I don't feel super confident adding a third party dependency freshly created and that hasn't a track record of 'working' in production. (and being popular amongst the community).

Technically, I think the package is robust. I’ve ported the test suite directly from the upstream Rust repo to typescript to ensure the behavior matches the source of truth 1:1. On top of that, I’ve included comprehensive unit tests in this PR to guarantee it integrates with the Airflow UI without causing regressions.

I completely understand the hesitation with relying on a fresh personal repo. I will continue promoting this package to other projects to build a stronger track record. My ideal target is try to donate and transfer this package to the Apache Datafusion upstream. This would make it into an community-governed artifact, removing the dependency on a single individual. However I hope we can still keep this conversation open. I’d really appreciate it if you could give the pkg a spin to see the value. Thanks!

@guan404ming guan404ming force-pushed the replace-with-sqlparser-rs branch 4 times, most recently from 481ca78 to 206a3fb Compare February 7, 2026 07:48
@guan404ming guan404ming force-pushed the replace-with-sqlparser-rs branch from 206a3fb to 49d1485 Compare February 7, 2026 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:UI Related to UI/UX. For Frontend Developers.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants