-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error if subqueries contain more than one result #5651
Error if subqueries contain more than one result #5651
Conversation
This approach introduces complexity to subquery plans, but in the case that the subquery can be statically seen to produce only a single record for each key, we should be able to optimize it down to the original plan: reduce calls on distinct keys are turned in to maps (and count specifically turns in to a 1), and then filters that see constant propagated information should be set to false and the whole branch replaced by the empty collection. It might not optimize out, but that would be an optimization bug or two to chase down. For subqueries that cannot be statically seen to produce single results, this maintains a count for each key, which seems about as harmless as you could make it. The maintained state could get 2x smaller if we could push the filter into the reduce, but that is a ways out still. |
So I looked into the
|
Well, the issue is |
Ok, I pushed up a fix for that issue. That gets the server booted up properly. There is now another issue, which will be exposed when CI runs SLTs, in which this call to unwrap materialize/src/transform/src/dataflow.rs Line 168 in 7d5e096
panics due to this query: CREATE VIEW github_2293 AS SELECT 1 AS a
SELECT 1::decimal(38,0) / 0 / a + 1 FROM github_2293; Not quite sure what code is to blame here, but that call to |
So, I'm not really sure what to do about this. The problem is that we used to assume that building a dataflow was an infallible operation, and now it is not. The flow for a, say,
So, (4) is now a fallible operation. Therefore it needs to happen before we update the catalog. Unfortunately that doesn't work because building a dataflow depends on having an updated catalog. I think the proper fix for this probably involves interactive catalog transactions. E.g.: let plan = plan(sql_string)?;
let plan = optimize(plan)?;
let cat_tx = begin_catalog_transaction()
cat_tx.create_view(plan.view_name);
cat_tx.create_index(plan.index_name, plan.view_name);
let dataflow = build_index_dataflow(cat_tx, plan.index_name)?;
cat_tx.commit();
ship_dataflow(dataflow); That's going to be a heavy lift though. We don't have anything like that |
@benesch It seems to me that MaterializeInc/database-issues#1715 is an example of either point 4 or point 5 from your list above. |
I think MaterializeInc/database-issues#1715 is point 5, in that we ship a dataflow that should be fine to build but we error building it. There is a fix up for it. I'm not sure I'd call it point 4, in that I think the only "corruption" is that the catalog contains an installed dataflow that will error on construction (so, the catalog correctly persists a correct thing that tickles a bug downstream). |
Is it obvious that it should panic? Like, it is a query that can be statically determined to be an error, but that's fine and we should probably reduce it down to that error and then return the error? Or is the transform operation meant to return evaluation errors itself, rather than optimize the expression down to something that may be a literal error? |
Different statement: I don't know what the intent of the |
Yeah, agree that those two things are conflated when they ought not to be. We can maybe buy ourselves some time by handling EvalErrors more delicately. That unwrap still stresses me out though! Even if we handle EvalErrors separately, what makes that unwrap safe? |
The only remaining error type is a fixed point evaluation error, which .. yeah could still happen but is now unexpected vs |
Yeah, my thoughts exactly! |
Allow RelationExpr::Constant to represent constant errors. This makes it possible for the optimizer to stuff any errors it generates during constant folding back into the RelationExpr so that those errors occur during execution of the query rather than during optimization. This unblocks an issue uncovered in MaterializeInc#5651.
I rebased this on #5669 locally and SLT is largely happy, modulo an errant subquery error or two that may or may not be legit! |
* transform: don't propagate expression errors in ColumnKnowledge The ColumnKnowledge transformation looks at every literal in the expression. If any of those literals are errors, it returns those errors to the client immediately. This behavior is arguably wrong, because those errors might never be evaluated. Rewire ColumnKnowledge to instead pack these errors into the DatumKnowledge it builds, rather than returning them to the client. * expr: allow constant errors Allow RelationExpr::Constant to represent constant errors. This makes it possible for the optimizer to stuff any errors it generates during constant folding back into the RelationExpr so that those errors occur during execution of the query rather than during optimization. This unblocks an issue uncovered in #5651. * strip lifted maps from first constant row * improve err display * more bugfixes * ensure all workers build the same dataflow * fold constant errors too * correct some pgtests again Co-authored-by: Frank McSherry <fmcsherry@me.com>
4f4dfda
to
4b32642
Compare
Thanks @benesch. This now starts up and has the following behavior:
Related: should we have a policy of requiring PRs to contain any @philip-stoev discovered errors as regression tests? |
Okies, now that it is up and running, there are naturally lots of regressions in query plans. Time to check those out. |
The only regressions are in Edit: that wasn't entirely correct. TPCH Q02 seems to have a regression; I'll look in to that. Edit: although the plans are currently gross, I don't foresee performance problems making things around global aggregates gross, because there should only be very few records in there. |
I think I see the issue here, which is that the multiplicity counting happens after the "outer join" work that jams on nulls if a group was empty. That isn't helpful, because we lose key information and clearly adding on those nulls only takes the count from zero to one, not to anything beyond. I'm going to look and see if it is easy to intervene and stitch in an error before that happens. |
For concreteness, here is the plan fragment that has me worried, from TPCH Q02:
Notice how |
Ah, so what is happening here is that TPCH Q02 is a global aggregation, with no key, although it is correlated. As such, it gets its |
Random additional thought: this can be implemented at nearly zero overhead, if instead of doing select
.project(input.arity())
.count()
.filter(|x| x > 1)
.map(err) we do select
.project(input.arity())
.concat(input.negate())
.threshold()
.map(err) The stateful operator in here, The edit: the |
I haven't been following this discussion super closely, @frankmcsherry. Let me know if I can help rubber duck things, or somesuch! |
There are some plan regressions when subqueries use key-less aggregation (pretty common in tpch). Also, testdrive is failing with:
|
The plan regressions would be pretty easy to fix with a transform analysis, but they are vexed by the fact that we only know how to produce "key" information by calling |
I put in a fair bit of a hack to |
@benesch it's pretty non-obvious what the query plans are for SHOW DATABASES. Searching around in the codebase did not reveal where |
@frankmcsherry the test recorded the situation as it was back then. I will push a fix shortly. |
Hi @wangandi! I am going to try and clean up the conflicts here, but I thought it might be good to have you take a look at it, as it touches several things we've discussed recently. I may try and break it in to a few PRs; the commits here are trying to sort out if it was possible without breaking all the tests, and I think so. However, it may be worth breaking out a few stages:
I'll clean up the conflicts, and if it seems appropriate I can re-do as a few PRs with the above structure (until I can't, and then I'll come back explaining how it is hard and I failed). |
7b901d4
to
4d43286
Compare
This is now erroring on a test
where it returns the empty set, but the test expects to see an error. The subquery doesn't look like it should return multiple results. I changed it to use the It was also erroring on the |
// TODO: perhaps ensure that (above) A.proj(key) is a | ||
// subset of B, as otherwise there are negative records | ||
// and who knows what is true (not expected, but again | ||
// who knows what the query plan might look like). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this why are we looking for such a complicated pattern instead of just B-A
, where B has unique key and A also has the same unique key ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow exactly, but .. if A is not a subset of B then there could be negative records.
44df4e8
to
f7e2de5
Compare
This PR demonstrates (perhaps incorrectly) how subqueries can be checked for multiple results, and yield errors if more than one record is returned for any key. I haven't worked in this code for a while, so it is possible that the random numbers I've picked are no longer the right ones, and there are better ways to do things.
The gist is that before returning results in a
SELECT
subquery, we also count the number of occurrences of each of the leading attributes. Any counts greater than one result in an error. The error could be enriched to present both the keys and the count, with some more attention.This current makes MZ unrunnable, with the error
This change is