Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Improve DataFrame docs, add examples #9159

Merged
merged 5 commits into from
Feb 9, 2024
Merged

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Feb 8, 2024

Which issue does this PR close?

Part of #7013

Rationale for this change

I was watching @carols10cents try to find how to write a parquet file via a DataFrame in the API docs and they left something to be desired

What changes are included in this PR?

  1. Add examples to DataFrame:: write_json, write_csv, write_parquet
  2. Improve more of the DataFrame documentations

Are these changes tested?

Covered by CI doc tests

Are there any user-facing changes?

Better docs

@github-actions github-actions bot added the core Core DataFusion crate label Feb 8, 2024
@alamb alamb added documentation Improvements or additions to documentation and removed core Core DataFusion crate labels Feb 8, 2024
/// ).await?;
/// # Ok(())
/// # }
/// ```
pub async fn write_parquet(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the example I wanted to write, but I started clicking through the API to get here and I saw too much along the way that needed to be fixed too, so I got a bit carried away and worked on quite a bit of the docs

@github-actions github-actions bot added core Core DataFusion crate and removed documentation Improvements or additions to documentation labels Feb 8, 2024
Copy link
Contributor

@carols10cents carols10cents left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOVE IT!! Thank you!!!!

I just have a few tiny things I noticed, but these docs are so much better!! 😍

datafusion/core/src/dataframe/mod.rs Outdated Show resolved Hide resolved
datafusion/core/src/dataframe/mod.rs Outdated Show resolved Hide resolved
datafusion/core/src/dataframe/mod.rs Outdated Show resolved Hide resolved
/// Convert the logical plan represented by this DataFrame into a physical plan and
/// execute it, collecting all resulting batches into memory
/// Executes this DataFrame and collects all results into a vector of RecordBatch.
/// Execute this `DataFrame` and buffer all resulting `RecordBatch`es into memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change in particular is excellent! The reason I'd want to call this method is precisely because I don't want to think about logical plans and physical plans and such 😉

/// Note: This method should not be used outside testing, as it loses the
/// snapshot of the [`SessionState`] attached to this [`DataFrame`] and
/// consequently subsequent operations may take place against a different
/// state (e.g. a different value of `now()`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this example!

datafusion/core/src/dataframe/mod.rs Outdated Show resolved Hide resolved
@@ -1025,6 +1107,7 @@ impl DataFrame {
}

/// Write this DataFrame to the referenced table by name.
///
/// This method uses on the same underlying implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line wasn't changed in this PR, but it caught my eye... it seems like there's an extra word?

Suggested change
/// This method uses on the same underlying implementation
/// This method uses the same underlying implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this comment - will make a follow on PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
@alamb
Copy link
Contributor Author

alamb commented Feb 8, 2024

Thank you for the review @carols10cents 🙏

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work thanks @alamb

@alamb
Copy link
Contributor Author

alamb commented Feb 9, 2024

Great work thanks @alamb

Thanks @comphead ❤️

@alamb alamb merged commit 701e0dd into main Feb 9, 2024
43 checks passed
@alamb alamb deleted the alamb/write_parquet_docs branch February 9, 2024 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants