Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cosmos: subquery pushdown #33968

Open
roji opened this issue Jun 12, 2024 · 1 comment
Open

Cosmos: subquery pushdown #33968

roji opened this issue Jun 12, 2024 · 1 comment

Comments

@roji
Copy link
Member

roji commented Jun 12, 2024

No description provided.

@roji
Copy link
Member Author

roji commented Jun 30, 2024

I spent some time on this today, here are some findings. tl;dr Cosmos limitations around LIMIT/OFFSET make subquery pushdown only barely useful, as LIMIT/OFFSET are the main reason. I'll shelf this for now - see WIP in https://github.com/roji/efcore/tree/CosmosPushdown.

  • Cosmos doesn't support LIMIT/OFFSET in subqueries.
  • As a result, we have special translation logic for LINQ Skip/Offset in subqueries; rather than translating to LIMIT/OFFSET, we convert the subquery in question to an array via the Cosmos ARRAY() operator, and compose ARRAY_SLICE() over that. This allows us to translate e.g. context.Blogs.Where(b => b.Ints.Skip(2).Take(3)...). Note that these query shapes don't involve a pushdown - the translation of the Where predicate is aware that it's in subquery context, and takes that into account.
  • An actual pushdown needs to happen when LINQ operators are composed in an order which would be incompatible with the SELECT clause evaluation order. For example, at the top-level, if Take().Skip() are composed, Skip() would need to trigger a a pushdown to make sure Take() is evaluated first.
  • Unfortunately, the ARRAY_SLICE() technique can't be used when pushing down a top-level query, since Cosmos doesn't support referencing the root container from inside an ARRAY() subquery. So although SELECT * FROM (SELECT * FROM root c) works fine (note root referenced inside the subquery, the following does not: SELECT * FROM (SELECT VALUE ARRAY(SELECT * FROM root c)). This makes it impossible for us to perform pushdown on a (top-level) query that has LIMIT/OFFSET.
  • Apart from LIMIT/OFFSET which can't be supported, the remaining causes for pushdown are:
    • Composing aggregate functions (e.g. Count) over DISTINCT
    • Composing ORDER BY over DISTINCT (but this shouldn't actually be needed, see #34121).
  • Note that since for subqueries we already avoid LIMIT/OFFSET, translating ARRAY_SLICE() as above; so once again there's little need to push down within subquery context as well (only the above two cases).
  • As a result, I'm shelving this effort for the meantime and will consult with the Cosmos people. If LIMIT/OFFSET in subqueries becomes supported, or some other workaround is found, we can bring this back and implement proper pushdown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant