-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-2960] [Bug] Compile performance hit (DBT 1.4.6 -> 1.6.0) #8360
Comments
@nicklubbers Thanks for opening the issue! To confirm - are you seeing the slowness in between the Just a hunch: I'm wondering if this could be due to the change we made to write I heard a similar report the other day about a slowdown in I'm going to label this a |
@jtcohen6 - Yes the slowness is between the Additional info:
is not as slow as 1.6.0 and comparable to 1.4.6. So it looks like introduced in 1.6.0. |
This looks like a potentially serious issue for large projects, introduced by changes to the way run_results.json is written in 1.6. We use Python's multiprocessing.pool.Pool to run tasks. Each task result is processed individually by a callback function which is fired sequentially by the pool, on a single thread it maintains for that purpose. By including this file write task in the callback, we are violating an important recommendation in the Python documentation. Writing the run results is not fast for large projects. The size of the run results file increases as the graph is executed and more results are added, and the entire file is rewritten on every node completion. As a result, the total work done writing run_results.json will be O(n²), where n is the number of nodes. Profiling showed that this doubled the total time needed to run dbt compile on the Q-Team-Pristine-2k-Models project, and the slowdown factor will likely be larger for larger projects. Potential fixes:
|
Is this a new bug in dbt-core?
Current Behavior
After upgrading DBT, our full compile time increases from 2m52sec to 18min!
Expected Behavior
That the compile time would decrease or stay the same.
Steps To Reproduce
dbt compile
output:
dbt compile
output:
Relevant log output
No response
Environment
Which database adapter are you using with dbt?
bigquery
Additional Context
No response
The text was updated successfully, but these errors were encountered: