-
Couldn't load subscription status.
- Fork 0
Description
基本材料
In March of 2021, when major work on the C++ query execution machinery
in Arrow was beginning, Wes sent a message [1] to the dev list and
linked to a doc [2] with some details about the planned design. A few
months later Neal sent an update [3] about this work. However those
documents are now somewhat out of date. More recently, Wes shared
another update [4] and linked to a doc [5] regarding task execution /
control flow / scheduling. However I think the best source of
information is the doc you linked to. The query execution work has
proceeded organically with many contributors, and efforts to document
the overall design in sufficient detail have not kept pace.
[1] https://lists.apache.org/thread/n632pmjnb85o49lyxy45f7sgh4cshoc0
[2] https://docs.google.com/document/d/1AyTdLU-RxA-Gsb9EsYnrQrmqPMOYMfPlWwxRi1Is1tQ/
[3] https://lists.apache.org/thread/3pmb592zmonz86nmmbjcw08j5tcrfzm1
[4] https://lists.apache.org/thread/ltllzpt1r2ch06mv1ngfgdl7wv2tm8xc
[5] https://docs.google.com/document/d/1216CUQZ7u4acZvC2jX7juqqQCXtdXMellk3lRrgP_WY/
[6] https://conbench.ursa.dev/
[7] https://lists.apache.org/thread/7v7vkc005v9343n49b3shvrdn19wdpj1
执行模型
- (some query engines use a "pull"-based model, in which the data flow is inverted — there are pros and cons to both approaches, see Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask)
- "My personal feeling is that the pull model was good in the early query execution engines, based on processing of a single row at a time and using virtual function calls to switch between relational operators within the query. In my experience, the push model is easier to work with in both modern worlds of query execution: JIT compiled query processing and vectorized query processing." - ARROW-11591: [C++][Compute] Grouped aggregation apache/arrow#9621