Skip to content

Conversation

@zaleslaw
Copy link
Collaborator

@zaleslaw zaleslaw commented Apr 28, 2025

  • std
  • median
  • percentile
  • min
  • max
  • mean

Implemented new functions for calculating min, max, median, and percentile on DataFrames. Added corresponding tests to ensure proper functionality across different scenarios and ensured refinements with annotations like `@Refine` and `@Interpretable`.
@zaleslaw zaleslaw requested review from Copilot and koperagen April 28, 2025 12:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds comprehensive support for statistical operations on DataFrames by introducing new functions and tests for mean, std, median, min, max, and percentile. Key changes include:

  • New test cases in Kotlin for each statistical function.
  • Updates to API methods and aggregator implementations with new interpretable annotations.
  • Adjustments in loadInterpreter, Aggregators, and GroupBy modules to integrate the statistics operations.

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
plugins/kotlin-dataframe/tests-gen/org/jetbrains/kotlin/fir/dataframe/DataFrameBlackBoxCodegenTestGenerated.java Added new test methods for statistical operations.
plugins/kotlin-dataframe/testData/box/*.kt New test files for std, percentile, min, median, mean, and max operations.
plugins/kotlin-dataframe/src/org/jetbrains/kotlinx/dataframe/plugin/loadInterpreter.kt Imported and registered new aggregator implementations.
plugins/kotlin-dataframe/src/org/jetbrains/kotlinx/dataframe/plugin/impl/api/statistics.kt Updated aggregator implementations to handle new statistics functions.
core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/api/statistics.kt Added and updated tests for DataFrame statistics methods.
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/aggregation/aggregators/Aggregators.kt Modified aggregator definitions for min, max, median, and percentile.
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/api/*.kt Added annotations and refinements for new API methods (std, percentile, min, median, mean, max).

@zaleslaw
Copy link
Collaborator Author

@koperagen I got this
`While analysing /percentile.kt:32:16: org.jetbrains.kotlinx.dataframe.plugin.InterpretationFrameworkError: ERROR: Different set of arguments
Implementation class: org.jetbrains.kotlinx.dataframe.plugin.impl.api.Percentile0@1f93c6b8
Not found in actual: []
Passed, but not expected: [percentile]
add arguments to an interpeter:
[RefinedArgument(name=percentile, expression=org.jetbrains.kotlin.fir.expressions.impl.FirLiteralExpressionImpl@6ecaf7ce)]

org.jetbrains.kotlin.util.FileAnalysisException: While analysing /percentile.kt:32:16: org.jetbrains.kotlinx.dataframe.plugin.InterpretationFrameworkError: ERROR: Different set of arguments
Implementation class: org.jetbrains.kotlinx.dataframe.plugin.impl.api.Percentile0@1f93c6b8
Not found in actual: []
Passed, but not expected: [percentile]
add arguments to an interpeter:
[RefinedArgument(name=percentile, expression=org.jetbrains.kotlin.fir.expressions.impl.FirLiteralExpressionImpl@6ecaf7ce)]`

but for running this code val res0 = personsDf.percentile(percentile = 30.0) what I'm doing wrong?

@zaleslaw zaleslaw marked this pull request as ready for review April 28, 2025 12:21
Replace `percentileArg` with a fixed value of 30.0 in test cases to ensure clearer functionality demonstration. Add `Arguments.percentile` with `ignore()` for improved handling and schema modification alignment in `Percentile0` and `Percentile1` classes.
Replaced schema construction to include only newly generated columns instead of merging with existing ones. Updated test cases to validate schema consistency using `compareSchemas`.
@zaleslaw zaleslaw requested a review from koperagen April 28, 2025 16:31
Copy link
Collaborator

@koperagen koperagen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Please check my comment, and let's wait for the build to finish. Otherwise looks good

@koperagen
Copy link
Collaborator

val df = dataFrameOf("a")(1, 2, 3)
df.groupBy { a named "b" }.sum { a }

Do you have a test for this?
it'll be columns b and a - surprisingly when only one column is selected in sum(ColumnsSelector), aggregated column will have its name

@Jolanrensen Jolanrensen added the Compiler plugin Anything related to the DataFrame Compiler Plugin label Apr 29, 2025
@Jolanrensen Jolanrensen added this to the 1.0.0-Beta1 (0.16) milestone Apr 29, 2025
@zaleslaw zaleslaw merged commit bcabfd0 into master Apr 29, 2025
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Compiler plugin Anything related to the DataFrame Compiler Plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants