Description
For the scenario of select expensive_thing(col1) from data
I would like to pre-process (speed up) expensive_thing(col1)
.
The easiest way I can think of doing this is by pre-computing the expression and saving it as a column with a specific name like _expensive_thing__col1
.
But then I have to hardcode this column into my schema, the hardcoded expressions need to be the same for every file, etc.
To me an ideal solution would be to push down the expression to the file reading level so that I can then check "does _some_other_expensive_expr__col38
exist in the file? if so read that, otherwise read col38
and compute the expression".
The tricky thing is I'd want to do this on a per-file level: depending on the data different expression/column combinations would be pre-computed; it's prohibitive to put them all in the schema that is shared amongst all files.