Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transform: add ColumnNames attribute #21796

Merged
merged 5 commits into from
Sep 19, 2023

Conversation

aalexandrov
Copy link
Contributor

Add a ColumnNames attribute and expose it in EXPLAIN outputs through the column_names option.

Motivation

  • This PR adds a feature that has not yet been specified.

We don't have an issue for that, but multiple people have expressed interest in having an EXPLAIN plan that prints humanized column names. This PR is the first step towards that.

Tips for reviewer

See commit messages for details (specifically the message of the third commit for motivation and background of the refactoring).

The first four commits do some refactoring to the testing and attribute derivation frameworks required that is needed in order to implement the ColumnNames inference logic. More specifically, we need a way to create Attribute implementations with parameters (until now the structs implementing Attribute were all nullary). This is achieved by the third and fourth commits which:

  • Introduce a struct-based DerivedAttributes container.
  • Pass context to the attribute constructors.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

@aalexandrov aalexandrov requested review from a team September 18, 2023 11:03
@shepherdlybot
Copy link

shepherdlybot bot commented Sep 18, 2023

This PR has higher risk. In addition to having a knowledgeable reviewer, it may be useful to add observability and/or a feature flag. What's This?

Risk Score Probability Buggy File Hotspots
🔴 83 / 100 65% 0

@aalexandrov aalexandrov requested review from vmarcos and removed request for a team September 18, 2023 11:03
@aalexandrov aalexandrov self-assigned this Sep 18, 2023
@aalexandrov aalexandrov added A-optimization Area: query optimization and transformation A-compute Area: compute labels Sep 18, 2023
Copy link
Contributor

@vmarcos vmarcos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, with the caveat that I do not have expertise in the attribute derivation framework (and just studied the necessary parts for reviewing). Overall, moving away from the dependency on typemap_rev seemed to be a positive development regardless, though, and I think that the refactor is amply justified.

There were a few minor questions I had while reviewing, and we can discuss below if any further changes to address those make sense or not.

src/transform/tests/test_transforms.rs Outdated Show resolved Hide resolved
src/transform/src/attribute/mod.rs Outdated Show resolved Hide resolved
src/transform/src/attribute/column_names.rs Outdated Show resolved Hide resolved
src/transform/src/attribute/column_names.rs Outdated Show resolved Hide resolved
Change the `DefSource` directive syntax from
```
DefSource name=t0 keys=[[#0]]
  - bigint
  - bigint?
```

to

```
DefSource name=t0 keys=[[#0]]
  - c0: bigint
  - c1: bigint?
```

so the `Humanizer` implementation for the `TestCatalog` can properly
answer `column_names_for_id` calls.
Use

```
explain with=(arity,types)
```

To output the explained plan with `arity` and `types` attribute annotations.

This also adds separate tests for `arity` and `types` to validate that
the new directive works as expected based on some existing attributes.
The `DerivedAttributes` and `RequiredAttributes` structs were hosting
instances of the attribute derivation algorithms for the attributes to
be derived in `TypeMap` instances.

This is preventing us from seeding the algorithm (`Attribute`) instances
with `&dyn ExprHumanizer` context because `TypeMap` keys need to be
thread-safe (`Send + Sync`).

In order to allow `Attribute` implementations to depend on context types
such as `&dyn ExprHumanizer` or `&dyn StatisticsOracle` we replace the
`TypeMap` fields with a custom struct called `AttributeStore`.

The struct has an `Option<T>` field provisioned for each `A` that
implements `Attribute`. Fields are listed in dependency order and
methods that need to respect the dependency order access them in the
same sequence. An implementation of `AttributeContainer<A>` for each
field provides a unified interfacing for accessin the attribute of type
`A` from client code. See the rustdoc comments on these items for more
details.

The change removes the following code from the `attribute` framework:

- The `typemap_rev` crate dependency (which provided `TypeMap`).
- The `AsKey` helper struct.
- The `AttributeId` enum and the `TOTAL_ATTRIBUTES` const field.
Add a `ColumnNames` attribute and expose it in `EXPLAIN` outputs through
the `column_names` option.
Copy link
Contributor

@vmarcos vmarcos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@aalexandrov aalexandrov merged commit b4f0992 into MaterializeInc:main Sep 19, 2023
@aalexandrov aalexandrov deleted the column_names_attribute branch September 19, 2023 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-compute Area: compute A-optimization Area: query optimization and transformation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants