Skip to content

Parse your Postgres queries into a 100% type-safe AST (powered by libpg_query)

Notifications You must be signed in to change notification settings

markandrus/pg-parser

 
 

Repository files navigation

@pg-nano/pg-parser

A fork of libpg-query with best-in-class type definitions and AST utilities.

import { parseQuery } from "@pg-nano/pg-parser"

const ast = await parseQuery("SELECT 1; SELECT 2")
//    ^? ParseResult

ast.version // => 160001
ast.stmts // => [{ stmt: SelectStmt, stmt_len: 8 }, { stmt: SelectStmt, stmt_location: 9 }]

Install

pnpm add @pg-nano/pg-parser

The major and minor version of this package is meant to be aligned with the supported PostgreSQL major and minor version. Older and newer versions may be compatible, but this is not guaranteed.

Upon install, the pre-compiled binary for your operating system and architecture will be pulled from GitHub Releases.

API

This package exports the following functions:

  • parseQuery (for async parsing a SQL string of one or more statements)
  • parseQuerySync
  • parsePlPgSQL (for async parsing a plpgsql string)
  • parsePlPgSQLSync
  • fingerprint (for generating a unique string for a SQL string)
  • fingerprintSync
  • splitWithScannerSync (for splitting a SQL string into one or more statements)
  • walk (for traversing the AST)
  • select (for type-safe, deep field access through dot-notation)
  • $ (for type-safe node proxy and type guards)

Note: There is no deparse function (for turning an AST back into a string) included, as this isn't needed for my use case.

Walking the AST

I've added a walk function for easy AST traversal. You can pass a callback or a visitor object. You can return false to not walk into the children of the current node.

Each node passed to your visitor is wrapped in a NodePath instance, which tracks the parent node and provides type guards (e.g. isSelectStmt) for type narrowing. You can access the underlying node with path.node.

import { parseQuerySync, walk, NodeTag } from "@pg-nano/pg-parser"

walk(parseQuerySync(sql), (path) => {
  path.tag // string
  path.node // the node object
  path.parent // the parent node

  if (path.isSelectStmt()) {
    // The visitor pattern is also supported.
    walk(path.node.targetList, {
      ColumnRef(path) {
        const id = path.node.fields
          .map((f) => (NodeTag.isString(f) ? f.String.sval : "*"))
          .join(".")

        console.log(id)
      },
    })

    // don't walk into the children
    return false
  }
})

I've also added a select function for type-safe, deep field access through dot-notation. You must not include the node types in the field path.

import { select, Expr } from "@pg-nano/pg-parser"

/**
 * Given an expression node of many possible types,
 * check for a `typeName` field.
 */
const typeName = select(expr as Expr, 'typeName')
//    ^? TypeName | undefined

Similar to select, you may like the $ function for field access. It returns a proxy that makes field access less verbose. It also comes with type guards for all nodes.

import { $, walk } from "@pg-nano/pg-parser"

walk(ast, {
  SelectStmt(path) {
    for (const target of path.node.targetList) {
      const { name, val } = $(target)

      if ($.isColumnRef(val)) {
        console.log(
          name,
          $(val).fields.map(field => {
            return $.isA_Star(field) ? "*" : field.String.sval
          }).join("."),
        )
      }
    }
  }
})

Type definitions

Every possible type that could be returned from libpg_query is defined in ast.ts. If a type is missing, it's probably because libpg_query didn't tell us about it (otherwise, please file an issue).

The type definitions are generated from the srcdata of libpg_query (the C library this package binds to). If you're interested in how they're generated, see scripts/generateTypes.ts and scripts/inferFieldMetadata.ts. For some cases, type definitions are manually specified in scripts/typeMappings.ts.

Other improvements

  • Uses prebuild-install to avoid bundling every platform's binaries into the package.
  • Added splitWithScannerSync for SQL statement splitting.
  • Generated unit tests (see snapshots of every SQL case supported by libpg_query).

Contributing

To generate the type definitions, you can use this command:

pnpm prepare:types

To compile the TypeScript bindings and the C++ addon (and recompile them on file changes), you can use this command:

pnpm dev

Otherwise, pnpm build will compile just once.

If you're editing C++ code, you'll want to have compiledb installed and the clangd extension in VSCode. This enables the clangd language server for features like autocomplete, static analysis, and code navigation.

brew install compiledb

⚠️ Windows support: The binding.gyp file is currently broken for Windows builds. Any help would be appreciated!

License

MIT

About

Parse your Postgres queries into a 100% type-safe AST (powered by libpg_query)

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • TypeScript 95.4%
  • C++ 3.0%
  • Python 1.1%
  • Other 0.5%