Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where can we find examples of serializing a view as a plan? #751

Open
mchades opened this issue Nov 28, 2024 · 3 comments
Open

Where can we find examples of serializing a view as a plan? #751

mchades opened this issue Nov 28, 2024 · 3 comments

Comments

@mchades
Copy link

mchades commented Nov 28, 2024

The example use case in homepage has this interesting line:

Serialize a plan that represents a SQL view for consistent use in multiple systems (e.g. Iceberg views in Spark and Trino)

It's a really awesome example, but I can't find any relative code of Substrait in Iceberg, Spark, and Trino.

Did I miss something?

BTW, I assume that it will be more friendly to attach the example link to the use cases on the homepage

@ingomueller-net
Copy link
Contributor

I think that these examples are "examples of potential future uses."

Interestingly enough, there has been a discussion on the Iceberg mailing list in the last few weeks to make exactly that envisioned use case a reality.

@EpsilonPrime
Copy link
Member

Gluten (a Spark plugin) has modified Substrait to read Iceberg files. That modification on my list to mainstream these changes at some point:

https://github.com/apache/incubator-gluten/blob/main/gluten-substrait/src/main/resources/substrait/proto/substrait/algebra.proto#L152

@drin
Copy link
Member

drin commented Dec 13, 2024

More generally speaking, there is no currently existing example that is interesting. To make an interesting one depends on a database having an interesting way of querying a view.

I threw together a simple example using ibis and duckdb here: query-duckdb-view

Representing a query of a view can happen a variety of ways: ReadRel and ExtensionLeafRel are 2 specific operators, but even ReadRel specifies a handful of particular approaches via the oneof read_type group of attributes. The provided example just uses ReadRel.named_table (I think).

Then, various systems will likely present views in different ways, though I assume many will resolve it at the catalog level: a "table name" that matches a view name will read from the view and be otherwise transparent.

Altogether, a logical example would be:

  1. Produce a substrait plan that specifies the name of a view in either a ReadRel or an ExtensionLeafRel.
  2. When consuming the substrait plan, either:
    • resolve the view directly (if the plan explicitly mentions a view name)
    • resolve the view indirectly (e.g. if the plan specifies the view via ReadRel.named_table)
  3. The query completes per usual.

How a producer does (1) and how a consumer does (2) is where you'd get a variety of interesting examples (maybe). If there's some particular examples you'd like then maybe you can propose them? I don't use iceberg, spark, or trino, so I don't have an environment in which I can produce examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants