Skip to content

Performance issues with many logical conditions #52

@imathews

Description

@imathews

Hi, and thanks for all the great work on this library! We've recently run into some pretty significant performance issues with complex logical conditions (first observed in 0.17.1, and still present on the latest 0.21.2 release).

I'm including the reproducible snippet below, where parsing a single CASE statement takes ~900ms (latest Chrome on an M1 MBP). Converting the statement to a simple logical expression improves performance a bit (~250ms), but is still pretty slow. In the real-world example that this code is derived from, the CASE statement contains ~36 conditions, which leads to a parse time of around 45 seconds.

I've tested this with different dialects, and all show the same results, except for postgres, which continued to spin for well over a minute before I stopped it. Please let me know if there's any additional information that I can provide that would be helpful.

-- query1
  SELECT
    CASE
      WHEN ((t2.dgns_prcdr=1) AND (((t26.dgns_prcdr=1) AND (t2.complication=t26.complication)) OR ((t27.dgns_prcdr=1) AND (t2.complication=t27.complication)) OR ((t28.dgns_prcdr=1) AND (t2.complication=t28.complication)) OR ((t29.dgns_prcdr=1) AND (t2.complication=t29.complication)) OR ((t30.dgns_prcdr=1) AND (t2.complication=t30.complication)) OR ((t31.dgns_prcdr=1) AND (t2.complication=t31.complication)) OR ((t32.dgns_prcdr=1) AND (t2.complication=t32.complication)) OR ((t33.dgns_prcdr=1) AND (t2.complication=t33.complication)) OR ((t34.dgns_prcdr=1) AND (t2.complication=t34.complication)) OR ((t35.dgns_prcdr=1) AND (t2.complication=t35.complication)) OR ((t36.dgns_prcdr=1) AND (t2.complication=t36.complication)) OR ((t37.dgns_prcdr=1) AND (t2.complication=t37.complication)) OR ((t38.dgns_prcdr=1) AND (t2.complication=t38.complication)) OR ((t39.dgns_prcdr=1) AND (t2.complication=t39.complication)) OR ((t40.dgns_prcdr=1) AND (t2.complication=t40.complication)) OR ((t41.dgns_prcdr=1) AND (t2.complication=t41.complication)) OR ((t42.dgns_prcdr=1) AND (t2.complication=t42.complication)) OR ((t43.dgns_prcdr=1) AND (t2.complication=t43.complication)) OR ((t44.dgns_prcdr=1) AND (t2.complication=t44.complication)) OR ((t45.dgns_prcdr=1) AND (t2.complication=t45.complication)) OR ((t46.dgns_prcdr=1) AND (t2.complication=t46.complication)) OR ((t47.dgns_prcdr=1) AND (t2.complication=t47.complication)) OR ((t48.dgns_prcdr=1) AND (t2.complication=t48.complication)) OR ((t49.dgns_prcdr=1) AND (t2.complication=t49.complication)))) THEN 1
     ELSE 0
    END AS dgns_prcdr
  FROM t0
 -- ... bunch of left joins for t1, t2, ...t46, not relevant to reproduce
-- query2: A bit faster if we remove the CASE statement, but still not great
  SELECT
   ((t2.dgns_prcdr=1) AND (((t26.dgns_prcdr=1) AND (t2.complication=t26.complication)) OR ((t27.dgns_prcdr=1) AND (t2.complication=t27.complication)) OR ((t28.dgns_prcdr=1) AND (t2.complication=t28.complication)) OR ((t29.dgns_prcdr=1) AND (t2.complication=t29.complication)) OR ((t30.dgns_prcdr=1) AND (t2.complication=t30.complication)) OR ((t31.dgns_prcdr=1) AND (t2.complication=t31.complication)) OR ((t32.dgns_prcdr=1) AND (t2.complication=t32.complication)) OR ((t33.dgns_prcdr=1) AND (t2.complication=t33.complication)) OR ((t34.dgns_prcdr=1) AND (t2.complication=t34.complication)) OR ((t35.dgns_prcdr=1) AND (t2.complication=t35.complication)) OR ((t36.dgns_prcdr=1) AND (t2.complication=t36.complication)) OR ((t37.dgns_prcdr=1) AND (t2.complication=t37.complication)) OR ((t38.dgns_prcdr=1) AND (t2.complication=t38.complication)) OR ((t39.dgns_prcdr=1) AND (t2.complication=t39.complication)) OR ((t40.dgns_prcdr=1) AND (t2.complication=t40.complication)) OR ((t41.dgns_prcdr=1) AND (t2.complication=t41.complication)) OR ((t42.dgns_prcdr=1) AND (t2.complication=t42.complication)) OR ((t43.dgns_prcdr=1) AND (t2.complication=t43.complication)) OR ((t44.dgns_prcdr=1) AND (t2.complication=t44.complication)) OR ((t45.dgns_prcdr=1) AND (t2.complication=t45.complication)) OR ((t46.dgns_prcdr=1) AND (t2.complication=t46.complication)) OR ((t47.dgns_prcdr=1) AND (t2.complication=t47.complication)) OR ((t48.dgns_prcdr=1) AND (t2.complication=t48.complication)) OR ((t49.dgns_prcdr=1) AND (t2.complication=t49.complication)))) AS dgns_prcdr
  FROM t0
 -- ... bunch of left joins for t1, t2, ...t46, not relevant to reproduce
parser.parse(query1, { dialect: 'bigquery' })
// -> 900 ms

parser.parse(query2, { dialect: 'bigquery' })
// -> 250 ms

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions