Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Add Daft into Iceberg documentation #9836

Merged
merged 14 commits into from
Mar 20, 2024
Merged

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Feb 29, 2024

  • Adds installation examples
  • Adds code examples for getting up and running with Daft + PyIceberg
  • Adds a type conversion matrix between Daft and PyIceberg

@github-actions github-actions bot added the docs label Feb 29, 2024
Copy link
Collaborator

@bitsondatadev bitsondatadev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this is a lot but had a lot of thoughts the. Got stuck.

jaychia and others added 7 commits March 1, 2024 16:00
Co-authored-by: Brian "bits" Olsen <bits@bitsondata.dev>
Co-authored-by: Brian "bits" Olsen <bits@bitsondata.dev>
Co-authored-by: Brian "bits" Olsen <bits@bitsondata.dev>
Co-authored-by: Brian "bits" Olsen <bits@bitsondata.dev>
@jaychia jaychia requested a review from bitsondatadev March 2, 2024 22:51
Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this mainly targets pyiceberg, I think this doc should live in https://github.com/apache/iceberg-python/

@jaychia
Copy link
Contributor Author

jaychia commented Mar 5, 2024

Given that this mainly targets pyiceberg, I think this doc should live in https://github.com/apache/iceberg-python/

Hello! The Daft query engine integrates with pyiceberg currently as a user API, but actually isn't tied to the PyIceberg project. In fact, we are actively exploring moving over to iceberg-rs as that project matures and we contribute functionality to it, for more native integrations with Daft on our Rust layer.

Daft is a fully featured distributed query engine, and we are actively working on non-PyIceberg specific functionality that is more applicable to the wider Iceberg ecosystem (e.g. partitioned writes, compaction stored procedures, orphan file pruning procedures etc). This is in contrast to pyiceberg-only integrations such as Pandas/Arrow which really just use pyiceberg for retrieving data into Python memory.

@bitsondatadev
Copy link
Collaborator

@nastra, I tend to agree with @jaychia on this one. I don't want to split up the documentation any more than necessary. Any compute engine that runs on Iceberg, I want to

I see this eventually looking like Trino's data sources or Kafka Connectors.

We've discussed this a bit before here: #9681

I think this future reorder will include engines based in languages outside of Java.

- limitations under the License.
-->

# Daft
Copy link
Contributor

@nastra nastra Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it seems the site can't be actually built when serving the docs locally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to successfully build the site with mkdocs serve!

INFO    -  Documentation built in 0.38 seconds
INFO    -  [12:52:52] Watching paths for changes: 'docs', 'mkdocs.yml'
INFO    -  [12:52:52] Serving on http://127.0.0.1:8000/
image

Did you have an error message here that I can look at the debug the issue?

Copy link
Contributor

@nastra nastra Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using https://github.com/apache/iceberg/blob/aff5b39a7dddd22790b6ba47f514860c53e33c00/site/README.md to locally serve the site. @bitsondatadev can you double-check please if the site properly renders for you when running ./dev/serve.sh?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I used to have a "nightly" build in there, but we took it out initially to avoid confusion. I think part of the build can just be to add "local" or something. Currently, the build just grabs the latest semantic version and points latest there, we could just do the same and point /site/docs/docs/local >> /docs and maybe expose another build option to enable that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bitsondatadev is this PR here ready to go in?

jaychia and others added 3 commits March 6, 2024 12:48
Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>
Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>
Copy link
Collaborator

@bitsondatadev bitsondatadev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra Ship it!

@nastra nastra merged commit f8d60ea into apache:main Mar 20, 2024
2 checks passed
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants