-
Notifications
You must be signed in to change notification settings - Fork 77
Add statistics on DataFrame support #1153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implemented new functions for calculating min, max, median, and percentile on DataFrames. Added corresponding tests to ensure proper functionality across different scenarios and ensured refinements with annotations like `@Refine` and `@Interpretable`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive support for statistical operations on DataFrames by introducing new functions and tests for mean, std, median, min, max, and percentile. Key changes include:
- New test cases in Kotlin for each statistical function.
- Updates to API methods and aggregator implementations with new interpretable annotations.
- Adjustments in loadInterpreter, Aggregators, and GroupBy modules to integrate the statistics operations.
Reviewed Changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| plugins/kotlin-dataframe/tests-gen/org/jetbrains/kotlin/fir/dataframe/DataFrameBlackBoxCodegenTestGenerated.java | Added new test methods for statistical operations. |
| plugins/kotlin-dataframe/testData/box/*.kt | New test files for std, percentile, min, median, mean, and max operations. |
| plugins/kotlin-dataframe/src/org/jetbrains/kotlinx/dataframe/plugin/loadInterpreter.kt | Imported and registered new aggregator implementations. |
| plugins/kotlin-dataframe/src/org/jetbrains/kotlinx/dataframe/plugin/impl/api/statistics.kt | Updated aggregator implementations to handle new statistics functions. |
| core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/statistics.kt | Added and updated tests for DataFrame statistics methods. |
| core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/Aggregators.kt | Modified aggregator definitions for min, max, median, and percentile. |
| core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/*.kt | Added annotations and refinements for new API methods (std, percentile, min, median, mean, max). |
|
@koperagen I got this org.jetbrains.kotlin.util.FileAnalysisException: While analysing /percentile.kt:32:16: org.jetbrains.kotlinx.dataframe.plugin.InterpretationFrameworkError: ERROR: Different set of arguments but for running this code |
Replace `percentileArg` with a fixed value of 30.0 in test cases to ensure clearer functionality demonstration. Add `Arguments.percentile` with `ignore()` for improved handling and schema modification alignment in `Percentile0` and `Percentile1` classes.
plugins/kotlin-dataframe/src/org/jetbrains/kotlinx/dataframe/plugin/impl/api/statistics.kt
Outdated
Show resolved
Hide resolved
plugins/kotlin-dataframe/src/org/jetbrains/kotlinx/dataframe/plugin/impl/api/statistics.kt
Outdated
Show resolved
Hide resolved
Replaced schema construction to include only newly generated columns instead of merging with existing ones. Updated test cases to validate schema consistency using `compareSchemas`.
plugins/kotlin-dataframe/src/org/jetbrains/kotlinx/dataframe/plugin/impl/api/groupBy.kt
Outdated
Show resolved
Hide resolved
koperagen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Please check my comment, and let's wait for the build to finish. Otherwise looks good
Do you have a test for this? |
Uh oh!
There was an error while loading. Please reload this page.