Skip to content

sql: fix serialization of datums and prefix casts #12836

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 20, 2017

Conversation

RaduBerinde
Copy link
Member

@RaduBerinde RaduBerinde commented Jan 10, 2017

Removing the "prefix" syntax for CastExprs; we now prepend the type as part of
Datum formatting (conditional on a new FmtFlag). This ensures that expressions
that contain DDInterval, DDate etc (without a CastExpr) can be formatted
and re-parsed correctly.

To avoid a redundant cast, we elide the CastExpr during type checking if the
argument is a constant that can become the required type.

See #12532 for more context.

Fixed #12532.

Note: the current change breaks a lot of tests that encode expressions (like TestSimplifyExpr, parse tests), I want to get feedback on the change before I go through correcting them.


This change is Reviewable

@knz
Copy link
Contributor

knz commented Jan 10, 2017

What was the rationale for having the format flag negated? (why did you chose noDatumTypes instead of showDatumTypes? We could then have showDatumTypes only when printing out the things for view descs)


Review status: 0 of 11 files reviewed at latest revision, 1 unresolved discussion.


pkg/sql/parser/datum.go, line 396 at r1 (raw file):

// Format implements the NodeFormatter interface.
func (d *DFloat) Format(buf *bytes.Buffer, f FmtFlags) {
	if !f.noDatumTypes {

Don't do this special casing here. Do it just once for all ambiguous datum types in FormatNode (format.go).


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

I figured that without them the expression may not be valid and we shouldn't print out invalid expressions. I don't have a strong opinion here though..


Review status: 0 of 11 files reviewed at latest revision, 1 unresolved discussion.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Jan 10, 2017

I think we ought to distinguish "pretty printing an expr for a human" and "serializing an expr in a format that is deserializable and also as much as possible readable by a human". Perhaps we should have different function names for these two things, and use them in the appropriate contexts.

@RaduBerinde
Copy link
Member Author

pkg/sql/parser/datum.go, line 396 at r1 (raw file):

Previously, knz (kena) wrote…

Don't do this special casing here. Do it just once for all ambiguous datum types in FormatNode (format.go).

But there is no generic way to do that since it applies only to some nodes (Datums with some types).. So we would have a big switch in FormatNode? Doesn't that go against the model of each node having its own Format function?

Or are you suggesting to include an explicit cast for all constants? (although that still requires a switch to only apply it to Datum nodes no?)


Comments from Reviewable

@RaduBerinde RaduBerinde force-pushed the fix-prefix-cast branch 2 times, most recently from 409cbf9 to 9a58315 Compare January 11, 2017 15:06
@RaduBerinde
Copy link
Member Author

Ok, take two. Please ignore the first two commits (I sent out a separate PR for those).

I reversed the meaning of the flag; it's now only enabled as part of a new FmtParsable, used by distsql and views. Let me know if this makes more sense.

I also found that we can't elide the CastExpr in all cases (e.g. 'hello'::char(2)). I made the check more strict to only elide the cast for types that can't have any parameters.


Review status: 0 of 14 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


Comments from Reviewable

@RaduBerinde RaduBerinde changed the title sql: fix formatting of prefix casts TIMESTAMP/DATE/INTERVAL sql: fix formatting/serialization of prefix casts TIMESTAMP/DATE/INTERVAL Jan 11, 2017
@nvb
Copy link
Contributor

nvb commented Jan 11, 2017

The changes here look good in general, but we should walk through all of the places where Expr.String is used and make sure it's being used correctly. I can imagine there are a number of places where it is being used when the properties of this new FmtParsable mode are actually desired.

Also, we are starting to overlap with #12611, which has related complications. It might not be worth worrying about yet, but does the second part of @knz's proposed plan there influence your design here at all? Perhaps instead of enforcing datum types through a formatting mode, we have a new tree transformation that can either annotate all expressions with TypeAnnotations ("full mode", needed for that issue) or only annotate enough to allow serialization-deserialization round-tripping ("parsable mode", needed for this issue). Just a thought, which may be mutual based on this comment from yesterday.


Reviewed 10 of 10 files at r4.
Review status: 10 of 14 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


pkg/sql/create.go, line 387 at r4 (raw file):

	var queryBuf bytes.Buffer
	var fmtErr error
	n.AsSource.Format(

I think we need to do the same thing with persisted expressions on TableDescriptors, like DEFAULT, CHECK, and REFERENCED.


pkg/sql/parser/datum.go, line 396 at r1 (raw file):

Previously, RaduBerinde wrote…

But there is no generic way to do that since it applies only to some nodes (Datums with some types).. So we would have a big switch in FormatNode? Doesn't that go against the model of each node having its own Format function?

Or are you suggesting to include an explicit cast for all constants? (although that still requires a switch to only apply it to Datum nodes no?)

I personally am fine with this approach FWIW.


pkg/sql/parser/type_check.go, line 211 at r4 (raw file):

			// precision), the CastExpr becomes a no-op and can be elided.
			switch expr.Type.(type) {
			case *BoolColType, *DateColType, *TimestampColType, *TimestampTZColType, *IntervalColType,

long line


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

There is definitely overlap with #12611 - this change solves a big part of the problem, that of interpreting literals/constants with the same types. The other concerns - around a function changing its return type - are something that we hold more control over and might not be problems in practice; in any case, we could add on top of the infrastructure proposed here to output function return types as well.


Review status: 10 of 14 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


Comments from Reviewable

@RaduBerinde RaduBerinde force-pushed the fix-prefix-cast branch 2 times, most recently from 641e4be to 3968361 Compare January 11, 2017 20:27
@RaduBerinde
Copy link
Member Author

Review status: 7 of 17 files reviewed at latest revision, 3 unresolved discussions.


pkg/sql/create.go, line 387 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think we need to do the same thing with persisted expressions on TableDescriptors, like DEFAULT, CHECK, and REFERENCED.

Thanks, great point! Done for default and check. I don't think FK references have any expressions.


pkg/sql/parser/type_check.go, line 211 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

long line

Done.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Jan 12, 2017

Mostly LGTM. Will want to have a look again when all the tests pass.


Reviewed 4 of 12 files at r2, 3 of 6 files at r3, 1 of 10 files at r4, 6 of 6 files at r5.
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


pkg/sql/parser/datum.go, line 396 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I personally am fine with this approach FWIW.

Ok I've decided I'm fine either way too.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

Thanks! Updated. The fixes around INTERVAL are basically a revert of my earlier misguided change #10712.

@nvb
Copy link
Contributor

nvb commented Jan 12, 2017

:lgtm: My only remaining question is why the removal of the two castSyntaxModes is still needed. Did the CastExpr nesting issue from #12532 not go away fully when we elided the CastExpr on constants?


Reviewed 1 of 6 files at r5, 7 of 7 files at r6.
Review status: all files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

I don't think it's necessary for the cases we've seen but I figured it's an incorrect way to format a cast in general and given that we have a lot of code that manipulates expressions it's hard to say for sure that the enclosed expression will always stay a constant.

By the way, castPrefixParens seems incorrect to me - formats INTERVAL ( .. ) '1h' as INTERVAL ('1h').

I was looking at writing some tests and I am realizing this doesn't quite work as intended.. It turns out that the TYPENAME xx notation only works if xx is a string. Things like DECIMAL 1.2 don't parse.. So I think I should replace the annotations I added with actual casts (1.2::DECIMAL), what do you think?


Review status: all files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


Comments from Reviewable

@nvb
Copy link
Contributor

nvb commented Jan 13, 2017 via email

@RaduBerinde
Copy link
Member Author

So 1.2:::DECIMAL correct?


Review status: all files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


Comments from Reviewable

@nvb
Copy link
Contributor

nvb commented Jan 13, 2017

Right. And actually the semantics wouldn't even be the same. Casts would only desire the type, while annotations would require it. We want the latter here.

If we're going to do this approach, we can probably unify the implementation ala @knz's suggestion about FormatNode.

@RaduBerinde
Copy link
Member Author

What do you mean casts only desire the type? If the cast isn't possible, that is an "invalid cast" error no?

One thing I'm wondering about is whether we will end up with two annotations which doesn't work (unlike casts) - would it be possible to ever (after some expression manipulation perhaps) have a DString insidea a TypeAnnotation and end up with something like `'1s':::STRING:::INTERVAL'?

I was also thinking of switching these INTERVAL 'xx' casts to TypeAnnotation as well but I'm worried about the above.


Review status: all files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


Comments from Reviewable

@nvb
Copy link
Contributor

nvb commented Jan 13, 2017

What I mean is that a CastExpr will try to perform whatever transformation it can to create the desired type from its subexpression, after desiring that type during the type check recursion. On the other hand, the AnnotateTypeExpr will require that type during the type check recursion. In practice when dealing with Constants, this just means that the CastExpr will be a little more permissive. For example: '123'::int will work but '123':::int wont. In this situation, we don't need (or particularly want) the extra permissiveness.

I can't think of any situation where it would be possible to get a DString inside of an AnnotateTypeExpr like you mentioned for two reasons. First, a DString inside of a type annotation for a different type would throw a type error during type checking. Second, a datum inside of a type annotation for its type will be constant evaluated during normalization.

I had the same idea a little while ago about switching the INTERVAL 'xx' casts to AnnotateTypeExpr as well, but it seemed like a big jump back then. Now that it looks like we're converging to that idea, and especially because we're considering formatting datums with type annotations, I think it's worth some consideration.

@RaduBerinde
Copy link
Member Author

Yeah I thought about it a bit and I don't think the case I mentioned could happen either. Sorry, you guys mentioned TypeAnnotation it in the issue but I was confused about what you meant (I was looking for that symbol instead of AnnotateTypeExpr). I'll give it a shot.


Review status: all files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

Wait, it seems we will actually have the exact same problem. AnnotateTypeExpr.TypeCheck replaces .Expr in place with the type-checked expression. So INTERVAL 'xx' will become an AnnotateTypeExpr that wraps a DInterval, which would serialize as a double type annotation. This will also leak in the original expression since it's an in-place change (as we saw with cast).

Should we make AnnotateTypeExpr.TypeCheck just return the underlying expression instead?


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

Argh.. I was just about to post a new version but I found the following case DECIMAL '1.5' (which we use in tests, and works in PG) doesn't work anymore if we change to a type annotation. So I guess we're stuck with using casts for these.


Review status: all files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

Ok, please check the latest version (and my ramblings above for context):

  • a new FmtFlag adds type annotations to datums and placeholders. This ensures
    that expressions that contain DDInterval, DDate can be formatted and
    re-parsed correctly. This replaces ad-hoc type prepending like
    INTERVAL '..' or DATE '..'.

  • CastExpr.TypeCheck now returns the underlying type-checked expression
    directly when it is a constant of the right type.

  • castPrefix (renamed to castPrepend) now only functions if the
    underlying expression is a string literal. castPrefixParens is removed (it
    was incorrect).

  • AnnotateTypeExpr.TypeCheck now returns the underlying type-checked
    expression rather than changing the AnnotateTypeExpr in-place and
    returning it (which could result in double annotations with the new
    FmtFlag). AnnotateTypeExpr is no longer a TypedExpr so it is never part
    of a type-checked expression (which makes sense).

  • type checking tests are extended to verify the serliazed type-checked
    expression, which verifies both serialization and type resolution.


Review status: 8 of 17 files reviewed at latest revision, 1 unresolved discussion.


Comments from Reviewable

@RaduBerinde RaduBerinde changed the title sql: fix formatting/serialization of prefix casts TIMESTAMP/DATE/INTERVAL sql: fix serialization of datums and prefix casts Jan 13, 2017
@nvb
Copy link
Contributor

nvb commented Jan 14, 2017

Reviewed 8 of 9 files at r7.
Review status: 16 of 17 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


pkg/sql/parser/format.go, line 138 at r7 (raw file):

	n.Format(buf, f)
	if f.disambiguateDatumTypes {
		// We don't need annotations for types like DBool, DInt, DNull, DTuple.

We do for DInt, as 1 is still ambiguous (it can become all numeric types if desired).

I would lean towards making this a method (AmbiguousFormat() bool) so that future datum types don't miss it.


pkg/sql/parser/type_check_test.go, line 38 at r7 (raw file):

		expected string
	}{
		{`NULL + 1`, `NULL`},

Why is this being normalized?


pkg/sql/parser/type_check_test.go, line 46 at r7 (raw file):

		{`NULL + 'hello'::bytes`, `NULL`},
		{`INTERVAL '1s'`, `'1s':::interval`},
		{`(1.1::decimal)::decimal`, `(1.1:::decimal::DECIMAL)::DECIMAL`},

The reason for the different case is because your AnnotateTypeExpr uses a CastTargetType (I think it should actually be a ColumnType, but ignore that for now) while your new disambiguateDatumTypes check uses a Type. Consider changing

buf.WriteString(n.(Datum).ResolvedType().String())

to

FormatNode(buf, f, DatumTypeToColumnType(n.(Datum).ResolvedType()))

Comments from Reviewable

@RaduBerinde
Copy link
Member Author

pkg/sql/parser/type_check_test.go, line 38 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why is this being normalized?

I could ask you the same question sir! Looks like BinaryEx[r.TypeCheck does that when one of the operands has the null type.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

pkg/sql/parser/format.go, line 138 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We do for DInt, as 1 is still ambiguous (it can become all numeric types if desired).

I would lean towards making this a method (AmbiguousFormat() bool) so that future datum types don't miss it.

I figured that if the expression type checks with 1interpreted as an int, that is what it will be interpreted upon re-parsing and type-checking, am I incorrect? (I'm just curious, I will add it regardless)

I am hesitating between adding a new method and pushing this back down into Format.. Seems a bit odd to me to add a different method that controls formatting when we already have a Format method.


Comments from Reviewable

@nvb
Copy link
Contributor

nvb commented Jan 14, 2017

Review status: 16 of 17 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


pkg/sql/parser/format.go, line 138 at r7 (raw file):

Previously, RaduBerinde wrote…

I figured that if the expression type checks with 1interpreted as an int, that is what it will be interpreted upon re-parsing and type-checking, am I incorrect? (I'm just curious, I will add it regardless)

I am hesitating between adding a new method and pushing this back down into Format.. Seems a bit odd to me to add a different method that controls formatting when we already have a Format method.

I'm pretty sure you're correct in the case where nothing changes between formatting and parsing, although I still have #12611 on the back of my mind, in which case ambiguity could be introduced.

And that's a good point, though I do think a new method would help separate concerns. I'm fine with either approach, so go with whatever you like best.


pkg/sql/parser/type_check_test.go, line 38 at r7 (raw file):

Previously, RaduBerinde wrote…

I could ask you the same question sir! Looks like BinaryEx[r.TypeCheck does that when one of the operands has the null type.

Huh.. would you look at that. I guess overload resolution won't really work with NULL operands. I forsee an overhaul of NULL handling in the near future, but in the meantime, carry on!


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

Updated, TFTR and for the great suggestions!


Review status: 14 of 17 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending.


pkg/sql/parser/format.go, line 138 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm pretty sure you're correct in the case where nothing changes between formatting and parsing, although I still have #12611 on the back of my mind, in which case ambiguity could be introduced.

And that's a good point, though I do think a new method would help separate concerns. I'm fine with either approach, so go with whatever you like best.

Done.


pkg/sql/parser/type_check_test.go, line 46 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The reason for the different case is because your AnnotateTypeExpr uses a CastTargetType (I think it should actually be a ColumnType, but ignore that for now) while your new disambiguateDatumTypes check uses a Type. Consider changing

buf.WriteString(n.(Datum).ResolvedType().String())

to

FormatNode(buf, f, DatumTypeToColumnType(n.(Datum).ResolvedType()))

Done.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

Hm, I now see this for TestShowCreateView:

got: CREATE VIEW T5 AS SELECT s, count(*) FROM d.t GROUP BY s HAVING count(*) > 3:::INT
expected: CREATE VIEW T5 AS SELECT s, count(*) FROM d.t GROUP BY s HAVING count(*) > 3

Similar issue - BinaryExpr.TypeCheck modifies the expression in place and the operand becomes a DInt. Do you think this is ok from a user perspective? @knz, @a-robinson


Review status: 14 of 17 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Jan 14, 2017 via email

@RaduBerinde
Copy link
Member Author

I could serialize the expression before type-checking to go around this issue. But it seems that for #12611 we would actually need to go in the opposite direction and actually serialize the type-checked expression, right? Should I do that in this change?

@RaduBerinde
Copy link
Member Author

I could serialize the expression before type-checking to go around this issue. But it seems that for #12611 we would actually need to go in the opposite direction and actually serialize the type-checked expression, right? Should I do that in this change?

Never mind this, doing either is not easy at all, I will not pursue that in this change.

@nvb
Copy link
Contributor

nvb commented Jan 17, 2017

The failing test actually seems like a good thing to me. It means that the ambiguity of the untyped numeric literal will only be considered once, and from then on, will always be guaranteed to be treated as the same type.


Review status: 14 of 18 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

Is the latest version good to go?

@nvb
Copy link
Contributor

nvb commented Jan 18, 2017

:lgtm_strong:


Review status: 14 of 18 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


pkg/sql/parser/datum.go, line 166 at r8 (raw file):

// AmbiguousFormat implements the Datum interface.
func (*DBool) AmbiguousFormat() bool { return false }

nit: I'd move all of these directly above or below their type's respective Format methods.


pkg/sql/parser/type_check.go, line 211 at r8 (raw file):

			// precision), the CastExpr becomes a no-op and can be elided.
			switch expr.Type.(type) {
			case *BoolColType, *DateColType, *TimestampColType, *TimestampTZColType,

We can't do this for column types that can have parameters but don't in this case? Like 1::INT?


Comments from Reviewable

This change addresses a few related problems (described in more detail
in cockroachdb#12532):
 - ambiguous datum types after serialization
 - invalid prefix casts after expression processing (like `INTERVAL INTERVAL `1.5s`').

Changes:

 - a new FmtFlag adds type annotations to datums and placeholders. This ensures
   that expressions that contain `DDInterval`, `DDate` can be formatted and
   re-parsed correctly. This replaces ad-hoc type prepending like
   `INTERVAL '..'` or `DATE '..'`.

 - `CastExpr.TypeCheck` now returns the underlying type-checked expression
   directly when it is a Datum that we know for sure is of identical type.

 - `castPrefix` (renamed to `castPrepend`) now only functions if the
   underlying expression is a string literal (preventing incorrect expressions
   like `DECIMAL '1.0':::STRING`). `castPrefixParens` is removed (it
   was incorrect).

 - `AnnotateTypeExpr.TypeCheck` now returns the underlying type-checked
   expression rather than changing the `AnnotateTypeExpr` in-place and
   returning it (which could result in double annotations with the new
   `FmtFlag`). `AnnotateTypeExpr` is no longer a `TypedExpr` so it is never part
   of a type-checked expression (which makes sense).

 - type checking tests are extended to verify the serialized type-checked
   expression, which verifies both serialization and type resolution.

Note that changing the prepend syntax to a `AnnotateTypeExpr` was considered
but that doesn't work for things like `DECIMAL '2.0'`.

Fixes cockroachdb#12532. Helps with cockroachdb#12611.
@RaduBerinde
Copy link
Member Author

Review status: 14 of 18 files reviewed at latest revision, 3 unresolved discussions.


pkg/sql/parser/datum.go, line 166 at r8 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: I'd move all of these directly above or below their type's respective Format methods.

Done.


pkg/sql/parser/type_check.go, line 211 at r8 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We can't do this for column types that can have parameters but don't in this case? Like 1::INT?

We could, do you think it's worth it? I didn't do it because it would be a bunch of cases that depend on each type... pehaps we should add a method to CastType so each type can decide if a cast can be elided?


Comments from Reviewable

@nvb
Copy link
Contributor

nvb commented Jan 20, 2017

Review status: 14 of 18 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


pkg/sql/parser/type_check.go, line 211 at r8 (raw file):

Previously, RaduBerinde wrote…

We could, do you think it's worth it? I didn't do it because it would be a bunch of cases that depend on each type... pehaps we should add a method to CastType so each type can decide if a cast can be elided?

I don't think so. Let's push this as is. I actually need to open an issue for CastExpr to handle column type parameters anyway.


Comments from Reviewable

@nvb
Copy link
Contributor

nvb commented Jan 20, 2017

Review status: 14 of 18 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


pkg/sql/parser/type_check.go, line 211 at r8 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I don't think so. Let's push this as is. I actually need to open an issue for CastExpr to handle column type parameters anyway.

Follow up issue #13024.


Comments from Reviewable

@RaduBerinde RaduBerinde merged commit 55b1eb9 into cockroachdb:master Jan 20, 2017
@RaduBerinde RaduBerinde deleted the fix-prefix-cast branch January 20, 2017 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestLogic/testdata/views fails with COCKROACH_DISTSQL_MODE=on
3 participants