Closed
Description
DataFusion creates loads of errors even on the happy path. However as of #7434, we now gather a backtrace for each error. This is rather expensive. Here is a profile dump from a prod workload:
In the said workload, there is a LOT of going for bookkeeping (that's why I had the profiler running in the first place) but the backtraces alone make up for 30% of the time. The place looks like this:
logical:
Projection: ...
TableScan: ...
physical:
ProjectionExec: ..
CoalesceBatchesExec: target_batch_size=8192
FilterExec: ...
ParquetExec: ...
(had to remove a good amount of details due to data protection, but the filters / predicates are rather simple)
I think there are two paths forward:
- do NOT generate backtraces for errors
- do NOT use errors for the happy path but rather
Option
or some other enum