-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there any restriction to ReferenceRel.subtree_ordinal
wrt the ordinal of the current plan?
#613
Comments
There is a class of essentially recursive plans that we don't have an explicit definition for at the moment but probably should. These are Common Table Expressions (CTEs). As such I don't think we want to outlaw recursion but I do agree the potentiality exists to make an invalid plan. At the moment it is up to the consumer to reject any plan it finds uncomfortably recursive. After we have CTEs defined and we find there are no other ways that need to be expressed we can look into restrictions that are easy to enforce at construction time. |
OK, I see how recursive CTEs are a reason to support cyclic references. If/when that should be supported, the documentation has to be changed, though: it currently says that the plan should be a DAG, which would forbid cycles/recursion. The concrete answer to my question, I guess, is that there is currently no restriction on
Right, that makes sense. But I guess we'd eventually want to know what a maximally conforming consumer is expected to do/accept, right? My understanding is that that is currently still open: it will eventually be a graph, potentially with cycles, but further restrictions on those cycles (related to recursive CTEs). |
I agree that we do need to know and describe those limits as well as the overall graph structure. If it matters to you we can try to tackle those problems now. |
OK, sounds good! For me, knowing that cycles can exist is good for. (#612 has a higher priority for me, personally.) |
Small addendum: I have checked several SQL dialects and all of them only allow self-references, not general cycles. The structure of recursive CTEs is usually a non-recursive base case and a recursive case that adds tuples to the base case until no more tuples are added. What is "added" this way is computed based on the content of "self" in the previous iteration, hence the self-references. I wouldn't know how that could be extended to general cycles... With one exception, all dialects I checked also only allowed backward references, i.e., each CTE in the WITH clause can only refer to other CTEs defined before (or itself, in case of a recursive CTE). This would correspond to restricting subtree_ordinal to be less or equal than the current ordinal. The one exception I saw is the BigQuery dialect, which also allows forward references in WITH RECURSIVE clauses, but still no cycles. So such a plan could still be ordered to consist of only forward references. I don't have a clear preference at this point. I think it's a trade-off between simplifying consumers and the ability to express things that might be desirable in the future but for which we don't have examples at this point. I am on the go right now but hopefully I'll add links later. |
My question is answered: there can be cycles (at least self-references) and there is no restriction on Unless people feel that we might want to change that, which I don't have the impression, we can close this issue now. |
I am wondering if the
subtree_ordinal
field in theReferenceRel
has any restrictions with respect to the ordinal of the current plan. One conceivable restriction would be that it is only allowed to have values lower than the one of the current plan. The specification does say that the relations in a plan form a DAG, and such a restriction would (1) enforce the fact that the relations form a DAG and (2) make it easier to reconstruct the references when converting the Substrait plan into any other format that cannot handle dangling references.If the answer is yes (or people think that such a restriction would be a good addition), then this should probably made explicit in the documentation.
If the answer is no, what happens to a plan where the references of relations form a cycle? Would such a plan simply be invalid?
The text was updated successfully, but these errors were encountered: