Skip to content

Rework of dfs functions to recursively #361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from
Closed

Conversation

Jolanrensen
Copy link
Collaborator

@Jolanrensen Jolanrensen commented Apr 20, 2023

@Jolanrensen Jolanrensen self-assigned this Apr 20, 2023
@Jolanrensen Jolanrensen added this to the 0.11.0 milestone Apr 20, 2023
@Jolanrensen Jolanrensen added the enhancement New feature or request label Apr 20, 2023
@Jolanrensen
Copy link
Collaborator Author

Jolanrensen commented Apr 21, 2023

While the current tactic works in all current tests, I found a major breaking new case:

public fun <C> ColumnSet<C>.first(condition: ColumnFilter<C> = { true }): SingleColumn<C> =
    transform { listOf(it.first(condition)) }.singleImpl()
    
df.select { something.first { ... }.recursively() }

Not only does recursively() currently not work well on SingleColumns (it puts an all() in between, to flatten a potential ColumnGroup and then runs on that), it should also pass the "recursively" call through singleImpl() and deliver it to transform {}...

Maybe my original idea of creating a different return-type for transform calls that allows them to be re-executed recursively was not such a bad idea...

@Jolanrensen
Copy link
Collaborator Author

df.select { first { ... }.recursively() } now works, but I 'm not sure why. Needs more tests and maybe a clearer way to indicate a ColumnSet passes on its recursive behavior to its parent or not.
For first {} which uses singleImp(), singleImpl() was modified to pass on its recursive behavior. This might however be needed in more places.

@Jolanrensen
Copy link
Collaborator Author

Jolanrensen commented Apr 22, 2023

df.select { it[0, 3].recursively() }
what does this even mean? Take index 0 and 3 of every column group maybe. But this would then break when it hits a non-columnGroup.
Allowing .recursively() on any columnSet might open a can of worms we don't want. I think I'm gonna have another attempt at changing the return type of functions that allow a .recursively() call in favor of this generic but messy approach.

@nikitinas what do you think?

Also, I'm curious about how you divided up the use of SingleColumn<*> and ColumnSet<*>, because many functions were defined like ColumnSet<*>.something() (because SingleColumn inherits from ColumnSet too) but then assume the receiver to be a SingleColumn ColumnGroup taking its children instead of just looping over the columns in the column set. This also made dfs {} behave unexpectedly combined with all().

How I see it there are two options:

  • SingleColumn<*>.cols(): ColumnSet<*> operates on the children of the single column.
    • df.select { cols(1, 2, 3) }
    • df.select { someGroup.cols(1, 2, 3) }
    • df.select { first { it.isColumnGroup() }.cols(1, 2, 3) }
  • ColumnSet<C>.cols(): ColumnSet<C> operates on the columns in the column set.
    • df.select { colsOf<String>().cols(1, 2, 3) }
    • df.select { all().cols(1, 2, 3) }

Also, what's the difference between all(), cols(), and children()?

This distinction also impacts how .recursively() works

@Jolanrensen
Copy link
Collaborator Author

This will be closed in favor of #363


/**
*
*/
public operator fun SingleColumn<AnyRow>.get(range: IntRange): ColumnSet<*> = cols(range)
public operator fun SingleColumn<*>.get(range: IntRange): ColumnSet<*> = cols(range)
Copy link
Collaborator

@koperagen koperagen Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I think SingleColumn<AnyRow> is a ColumnGroup and these operations make sense, but not for SingleColumn<*>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well yes, but then it breaks when you

df.select { it["name"][0..5] }

since it doesn't know the type of the single column you select that way.

Also, I'm gonna close this PR in favor of dfs-rename3, just waiting for Tolya to answer.

@zaleslaw zaleslaw removed this from the 0.11.0 milestone Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants