Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestLogic/testdata/views fails with COCKROACH_DISTSQL_MODE=on #12532

Closed
RaduBerinde opened this issue Dec 21, 2016 · 19 comments · Fixed by #12836
Closed

TestLogic/testdata/views fails with COCKROACH_DISTSQL_MODE=on #12532

RaduBerinde opened this issue Dec 21, 2016 · 19 comments · Fixed by #12836
Assignees

Comments

@RaduBerinde
Copy link
Member

In distsql mode, this pair of statements fails as below:

CREATE VIEW dt AS SELECT d, t FROM t WHERE d > DATE '1988-11-12' AND t < TIMESTAMP '2017-01-01'
SELECT * FROM dt
    	logic_test.go:1274: testdata/views:338: expected success, but found
    		pq: unsupported comparison operator: <date> > <string>

This likely has to do with the serialization of the expressions - DATE, TIMESTAMP disappear:
image

@RaduBerinde RaduBerinde self-assigned this Dec 21, 2016
@RaduBerinde
Copy link
Member Author

The problem can be reproduced without views, just with the statement

SELECT d, t FROM test.t WHERE (d > DATE '1988-11-12') AND (t < TIMESTAMP '2017-01-01 00:00:00+00:00')

But this is a more tricky to fix than I thought. If I make things like DDate format as DATE '...' (similar to interval), the statement above works with distsql, but then in the view case, we get DATE, TIMESTAMP twice:

pq: failed to parse underlying query from view "dt": syntax error at or near "DATE"
    		SELECT d, t FROM test.t WHERE (d > DATE DATE '1988-11-12') AND (t < TIMESTAMP TIMESTAMP '2017-01-01 00:00:00+00:00')

Note that this issue already exists on master with interval:

CREATE VIEW lol AS SELECT d, t FROM t WHERE d > d + INTERVAL '10h'
SELECT * FROM lol
pq: failed to parse underlying query from view "lol": syntax error at or near "INTERVAL"
    		SELECT d, t FROM test.t WHERE d > (d + INTERVAL INTERVAL '10h0m0s')

CC @a-robinson, @knz
Where does the DATE, TIMESTAMP come from in the view case? And what would be the correct way to make these show up properly, while still having them show up when just serializing (formatting) an arbitrary expression containing a DDate etc?

@a-robinson
Copy link
Contributor

Interesting, thanks for all the details.

I believe the redundant DATE/TIMESTAMP keywords when you add the type to DDate's format method are coming from the ColumnType's format method (which gets called by (*CastExpr).Format).

I don't know much of anything about the distsql serialization, so I can't say for sure what the correct way to resolve this is. However, it seems more accurate to me to keep the printing of types in CastExpr rather than in the Datum types, such that we only pretty-print explicit types when the user explicitly typed a literal. That would mean removing the INTERVAL literal from (*DInterval).Format and solving the typing for serialization another way. It may be that we shouldn't be using the same Datum Format method for both serialization and pretty-printing. But again, I haven't looked at the serialization code at all.

@RaduBerinde
Copy link
Member Author

RaduBerinde commented Dec 28, 2016

We aren't doing any specific serialization - we just Format the expression and expect it to parse back to the same thing. So for e.g. a DInterval in an expression, we need to format it as INTERVAL '..' otherwise the formatted expression does not parse.

I don't understand why we have a CastExpr in this case? I thought INTERVAL '..' parses to just a DInterval, not a CastExpr.

@a-robinson
Copy link
Contributor

We handle INTERVAL inconsistently in parsing, it turns out.

In most uses, we parse it as a CastExpr (which leads to the double printing of the INTERVAL constant):

| const_interval SCONST opt_interval

We handle it differently when it's part of a timezone value, though (which currently necessitates the inclusion of the INTERVAL constant in DInterval's Format method):

| const_interval SCONST opt_interval

@RaduBerinde
Copy link
Member Author

I'm still confused about something - in that case CastExpr.Expr is a StrVal, not a DInterval.. If at some point we form a DInterval, it should replace the entire CastExpr, not just CastExpr.Expr no?

@a-robinson
Copy link
Contributor

Yes, that would make more sense than keeping the CastExpr around.

@RaduBerinde
Copy link
Member Author

I've done a bunch of investigation and I think I got to the bottom of this.

Take this statement:

SELECT * FROM t WHERE d > d + INTERVAL '10h';

WHERE expression after parsing has a CastExpr(StrVal):

  &ComparisonExpr
    UnresolvedName{"d"}
    &BinaryExpr
      UnresolvedName{"d"}
      &CastExpr
        &StrVal{s:"10h", bytesEsc:false, resString:"", resBytes:""}

WHERE expression in the scanNode after makePlan/expandPlan (as seen by distsql
planning) has just a DInterval:

  &ComparisonExpr
    &IndexedVar{Idx:0, container:(*sql.scanNode)(0xc420dbb000)}
    &BinaryExpr
      &IndexedVar{Idx:0, container:(*sql.scanNode)(0xc420dbb000)}
      &DInterval{Duration:duration.Duration{Months:0, Days:0, Nanos:36000000000000}}

Now for the view case:

CREATE VIEW dt AS SELECT d, t FROM t WHERE d > d + INTERVAL '10h';

In CreateView before calling Select() we have a CastExpr(StrVal), like before:

  &ComparisonExpr
    UnresolvedName{"d"}
    &BinaryExpr
      UnresolvedName{"d"}
      &CastExpr
        &StrVal{s:"10h", bytesEsc:false, resString:"", resBytes:""}

But after calling Select() we have the "faulty" CastExpr(DInterval):

  &ComparisonExpr
    UnresolvedName{"d"}
    &BinaryExpr
      UnresolvedName{"d"}
      &CastExpr{Expr:(*DInterval)(0xc420a05b80), Type:(*IntervalColType)(0x2b5b4a0), typeAnnotation:typeAnnotation{typ:tInterval{}}, syntaxMode:2}
        &DInterval{Duration:duration.Duration{Months:0, Days:0, Nanos:36000000000000}}

What is happening is fairly subtle:

  • CastExpr.TypeCheck modifies CastExpr in place and changes CastExpr.Expr
    to a DInterval; @nvanbenschoten I see that many TypeCheck implementations
    make in-place modifications.. is it required to return the same expr or is it
    ok to create a brand new CastExpr and return that?

  • when creating the select node, we normalize the WHERE expression. The
    normalization includes evaluating constant expressions which causes
    CastExpr(DInterval) to be replaced with a DInterval. This explains why in
    the normal path, we don't see the double INTERVAL problem.

In the view case, the former "leaks" into the statement. The latter does not
(the normalized expression is a separate expression used internally by the
selectNode).

I see a bunch of potential fixes to CastExpr.TypeCheck:

  • change it to return a new CastExpr instead of modifying in place

  • the modified CastExpr should have syntaxMode reset to castExplicit (at
    least in the case where it is castPrefix); the resulting expression
    CAST(INTERVAL '..' AS INTERVAL) would be valid.

  • make it elide the CastExpr and directly return the inner expression if its
    resolved type matches the cast type.

@knz, @nvanbenschoten I am inclined to do all of the above.. Thoughts?

@nvanbenschoten
Copy link
Member

Nice job tracking this issue down @RaduBerinde. Before coming up with the exact fix, I think we first need to resolve the inconsistency you pointed out with timestamp and date datums formatting without their type in front, unlike interval datums. Clearly, these need to all act the same way, and we almost certainly want datums to roundtrip when printed and then parsed, so I think we'd want to go with the interval approach (ie. DDate formats as DATE '...', etc.).

Once that issue is resolved, I think it would be sufficient to make a small modification to your third fix dealing with CastExpr.TypeCheck. We could elide the CastExpr and directly return the inner expression if the inner expression is a Constant who can become the cast type. This would play nicely with the fix we made above and would avoid any issues with serialization rountripping for the discussed types. We might, however, run into issues with DStrings, which could result in introduced ambiguity (ie. 'ABC'::STRING vs. 'ABC'). To get around this I think we could wrap the returned expression from CastExpr.TypeCheck for this new case in a type annotation. I think this approach would actually work even if we instead decided to change DInterval.Format to format without the type name.

@RaduBerinde
Copy link
Member Author

On the first part - of course. It just needs to be done in the same PR with the other fix (it would break an existing test mentioned above).

I see the ambiguity problem. What about the other proposal - can't CastExpr.TypeCheck return a new CastExpr instead of modifying in place? (one that also has a "corrected" syntaxMode)

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Dec 30, 2016 via email

@RaduBerinde
Copy link
Member Author

No correctness issues, just form - the expression stored in the view will have this weird extra cast (I am guessing the user can list the views so it would be visible).

@nvanbenschoten
Copy link
Member

Returning to the eliding the CastExpr idea for constants who can become the cast type, this would actually be completely possible without the added TypeAnnotation I mentioned earlier if we made DString and DBytes format with their types as well. I think this would also be beneficial for #12611, as it would remove any ambiguity introduced from serializing those types.

@RaduBerinde
Copy link
Member Author

I see. Another question I had is if these expressions should be CastExpr in the first place.. They are not intended to work in general, are they? Eg does postgres support 'INTERVAL (some expression returning a string)'?

@knz
Copy link
Contributor

knz commented Dec 31, 2016

no it doesn't.

Why not using TypeAnnotation again? It sounded to me like the right approach.

@RaduBerinde
Copy link
Member Author

Why do we need any enclosing expr - can't the parser when it encounters INTERVAL 'xx' simply call a function that creates a DInterval from 'xxx'?

@RaduBerinde
Copy link
Member Author

I am assuming that this TYPENAME <CONSTANT> format is only used for a few types (dates, timestamps, intervals), am I incorrect about that?

@nvanbenschoten
Copy link
Member

Postgres expresses that syntax with a cast, which I assume is why we did it in the first place. However, I can't think of any pressing reasons why we need to use a CastExpr instead of just doing what you suggest (before we pulled out parsing logic from CastExpr.Eval, there was more of a reason) and performing the "cast" in the parser.

@RaduBerinde
Copy link
Member Author

Do we need to support a placeholder inside the constant part? My guess is no - that would normally be done with an explicit cast, not with an expression like INTERVAL $1?

@RaduBerinde
Copy link
Member Author

I think that we can't do the cast at parse time because we need a SemaContext (for dates, timestamptz).

I'm struggling to find the correct solution for this, the problem is pretty general: if we want any expression to not change semantically after a formatting and parsing cycle, it seems that we would need to add a type annotation to most datums (even DDecimal, DFloat, not just DString/DBytes).. It shouldn't matter if the datum was obtained from an expression containing a cast or it was the result of constant folding or other expression processing.

So I'm thinking we should 1) add a type annotation to all datums that can result in ambiguous parsing and 2) retire the castPrefix syntax and do the CastExpr elision when possible.

RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Jan 10, 2017
Removing the "prefix" syntax for `CastExpr`s; we now prepend the type as part of
Datum formatting (conditional on a new `FmtFlag`). This ensures that expressions
that contain `DDInterval`, `DDate` etc (without a `CastExpr`) can be formatted
and re-parsed correctly.

To avoid a redundant cast, we elide the `CastExpr` during type checking if the
argument is a constant that can become the required type.

See cockroachdb#12532 for more context.

Fixed cockroachdb#12532.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Jan 11, 2017
Introducing a FmtFlag that disambiguates constants with a type prefix. This
ensures that expressions that contain `DDInterval`, `DDate` etc (without a
`CastExpr`) can be formatted and re-parsed correctly.

To avoid a redundant cast, we elide the `CastExpr` during type checking if the
argument is a constant that can become the required type.

The new `FmtParsable` is used when storing expressions in view and when
serializing expressions for distsql.

Fixes cockroachdb#12532.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Jan 11, 2017
Introducing a FmtFlag that disambiguates constants with a type prefix. This
ensures that expressions that contain `DDInterval`, `DDate` etc (without a
`CastExpr`) can be formatted and re-parsed correctly.

To avoid a redundant cast, we elide the `CastExpr` during type checking if the
argument is a constant that can become the required type.

The new `FmtParsable` is used when storing expressions in view and when
serializing expressions for distsql.

Fixes cockroachdb#12532.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Jan 12, 2017
Introducing a FmtFlag that disambiguates constants with a type prefix. This
ensures that expressions that contain `DDInterval`, `DDate` etc (without a
`CastExpr`) can be formatted and re-parsed correctly.

To avoid a redundant cast, we elide the `CastExpr` during type checking if the
argument is a constant that can become the required type.

The new `FmtParsable` is used when storing expressions in view and when
serializing expressions for distsql.

Fixes cockroachdb#12532.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Jan 13, 2017
This change addresses a few related problems (described in more detail
in cockroachdb#12532):
 - ambiguous datum types after serialization
 - invalid prefix casts after expression processing (like `INTERVAL INTERVAL `1.5s`').

Changes:

 - a new FmtFlag adds type annotations to datums and placeholders. This ensures
   that expressions that contain `DDInterval`, `DDate` can be formatted and
   re-parsed correctly. This replaces ad-hoc type prepending like
   `INTERVAL '..'` or `DATE '..'`.

 - `CastExpr.TypeCheck` now returns the underlying type-checked expression
   directly when it is a Datum that we know for sore is of identical type.

 - `castPrefix` (renamed to `castPrepend`) now only functions if the
   underlying expression is a string literal (preventing incorrect expressions
   like `DECIMAL '1.0':::string`). `castPrefixParens` is removed (it
   was incorrect).

 - `AnnotateTypeExpr.TypeCheck` now returns the underlying type-checked
   expression rather than changing the `AnnotateTypeExpr` in-place and
   returning it (which could result in double annotations with the new
   `FmtFlag`). `AnnotateTypeExpr` is no longer a `TypedExpr` so it is never part
   of a type-checked expression (which makes sense).

 - type checking tests are extended to verify the serliazed type-checked
   expression, which verifies both serialization and type resolution.

Note that changing the prepend syntax to a `AnnotateTypeExpr` was considered
but that doesn't work for things like `DECIMAL '2.0'`.

Fixes cockroachdb#12532. Resolves some of the concerns in cockroachdb#12611.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Jan 14, 2017
This change addresses a few related problems (described in more detail
in cockroachdb#12532):
 - ambiguous datum types after serialization
 - invalid prefix casts after expression processing (like `INTERVAL INTERVAL `1.5s`').

Changes:

 - a new FmtFlag adds type annotations to datums and placeholders. This ensures
   that expressions that contain `DDInterval`, `DDate` can be formatted and
   re-parsed correctly. This replaces ad-hoc type prepending like
   `INTERVAL '..'` or `DATE '..'`.

 - `CastExpr.TypeCheck` now returns the underlying type-checked expression
   directly when it is a Datum that we know for sure is of identical type.

 - `castPrefix` (renamed to `castPrepend`) now only functions if the
   underlying expression is a string literal (preventing incorrect expressions
   like `DECIMAL '1.0':::STRING`). `castPrefixParens` is removed (it
   was incorrect).

 - `AnnotateTypeExpr.TypeCheck` now returns the underlying type-checked
   expression rather than changing the `AnnotateTypeExpr` in-place and
   returning it (which could result in double annotations with the new
   `FmtFlag`). `AnnotateTypeExpr` is no longer a `TypedExpr` so it is never part
   of a type-checked expression (which makes sense).

 - type checking tests are extended to verify the serialized type-checked
   expression, which verifies both serialization and type resolution.

Note that changing the prepend syntax to a `AnnotateTypeExpr` was considered
but that doesn't work for things like `DECIMAL '2.0'`.

Fixes cockroachdb#12532. Helps with cockroachdb#12611.
RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Jan 18, 2017
This change addresses a few related problems (described in more detail
in cockroachdb#12532):
 - ambiguous datum types after serialization
 - invalid prefix casts after expression processing (like `INTERVAL INTERVAL `1.5s`').

Changes:

 - a new FmtFlag adds type annotations to datums and placeholders. This ensures
   that expressions that contain `DDInterval`, `DDate` can be formatted and
   re-parsed correctly. This replaces ad-hoc type prepending like
   `INTERVAL '..'` or `DATE '..'`.

 - `CastExpr.TypeCheck` now returns the underlying type-checked expression
   directly when it is a Datum that we know for sure is of identical type.

 - `castPrefix` (renamed to `castPrepend`) now only functions if the
   underlying expression is a string literal (preventing incorrect expressions
   like `DECIMAL '1.0':::STRING`). `castPrefixParens` is removed (it
   was incorrect).

 - `AnnotateTypeExpr.TypeCheck` now returns the underlying type-checked
   expression rather than changing the `AnnotateTypeExpr` in-place and
   returning it (which could result in double annotations with the new
   `FmtFlag`). `AnnotateTypeExpr` is no longer a `TypedExpr` so it is never part
   of a type-checked expression (which makes sense).

 - type checking tests are extended to verify the serialized type-checked
   expression, which verifies both serialization and type resolution.

Note that changing the prepend syntax to a `AnnotateTypeExpr` was considered
but that doesn't work for things like `DECIMAL '2.0'`.

Fixes cockroachdb#12532. Helps with cockroachdb#12611.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants