-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clarify semantics in order_by docstring #518
Conversation
Deploying datachain-documentation with Cloudflare Pages
|
Order is not guaranteed when steps are added after an `order_by` statement. | ||
I.e. when using `from_dataset` an `order_by` statement should be used if | ||
the order of the records in the chain is important. | ||
Using `order_by` directly before `limit` will give expected results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume collect
, show
, to_parquet
- are not considered as steps and will give also a "desired" output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collect
- should but not guranteed (explanation in the original ticket #477)
show
- yes
to_parquet
- should but not guaranteed (uses collect_flatten
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, probably needs to be a separate ticket then - e.g. combine things like order_by().collect()
into a single statement, wdyt? (can be / should be implemented separately)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean adding methods like collect_ordered
OR adding an explicit order_by
param to those methods or patching those functions? Is there a specific use case we're targeting or are we just trying to avoid confusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a basic expectation that if you do order_by().collect()
that you are getting a sorted result? collect is a way to iterate on the results I guess.
I mean that collect
probably should be implemented similar to show() - something like ``order_by().collect()` should be a single SQL query probably
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #518 +/- ##
=======================================
Coverage 87.87% 87.87%
=======================================
Files 96 96
Lines 9873 9873
Branches 1349 1349
=======================================
Hits 8676 8676
Misses 857 857
Partials 340 340
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏👍
Closes #477