Skip to content

"generic interpretable" functions for compiler plugin #1347

@Jolanrensen

Description

@Jolanrensen

I had an idea for the compiler plugin which could make working with it a bit easier. I already discussed it with @koperagen some time ago, but let's put it here so we have something to keep track of:

The idea is this: Let's say you have a large function like this that consists solely of interpretable dataframe operations:

fun <T> DataFrame<T>.convertDuckDbTypes() = this
    .convert { colsOf<java.sql.Array>() }.with(infer = Infer.Type) { (it.array as Array<Any?>).toList() }
    .convert { colsOf<java.sql.Struct>() }.with { (it as DuckDBStruct).map }
    .convert { colsOf<Map<String, *>>() }.with { it.mapValues { listOf(it.value) }.toDataFrame().single() }
    .convert { colsOf<JsonNode>() }.with { DataRow.readJsonStr(it.toString()) }
    .convert { colsOf<java.time.LocalTime>() }.toLocalTime()
    .convert { colsOf<java.time.LocalDate>() }.toLocalDate()
    .convert { colsOf<java.time.OffsetDateTime>() }.with { it.toInstant().toKotlinInstant() }
    .convert { colsOf<java.sql.Timestamp>() }.with { it.toLocalDateTime().toKotlinLocalDateTime() }

Since the compiler plugin can reason about all function invocations inside the body, it should be able to infer the combined effect of the entire convertDuckDbTypes() function.

We might need to introduce a new annotation, like @GenericInterpretable or simply @Interpretable without arguments, such that the compiler plugin becomes aware of this type of function. This also allows us to provide errors when a user puts something inside that the compiler plugin cannot reason about.

Alternatively, it may be possible to recognize these types of functions automatically without annotations, but I'm not entirely sure. They would have in common that they have a DataFrame type in the input, as well as the output.

The return type of this function would also need to be different than DataFrameType_123, as it would be a mapping from an original dataframe type to a new one.

Something like this:

@Interpretable
@Refine
fun <T : HasId> DataFrame<T>.doSomething(): DataFrame<GenericDataFrameType_123> = renameToCamelCase().convert { id }.toLong()

val df1: DataFrame<HasId> = ...
val myId: Int = df1.ID

val df2: DataFrame<HasId_1> = df1.doSomething()
val myId2: Long = df2.id

Function arguments would need special treatment as well, but they could likely be treated similarly to const values. This could be added later, though.

Metadata

Metadata

Labels

Compiler pluginAnything related to the DataFrame Compiler PluginenhancementNew feature or requestresearchThis requires a deeper dive to gather a better understanding

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions