-
Notifications
You must be signed in to change notification settings - Fork 75
Description
I had an idea for the compiler plugin which could make working with it a bit easier. I already discussed it with @koperagen some time ago, but let's put it here so we have something to keep track of:
The idea is this: Let's say you have a large function like this that consists solely of interpretable dataframe operations:
fun <T> DataFrame<T>.convertDuckDbTypes() = this
.convert { colsOf<java.sql.Array>() }.with(infer = Infer.Type) { (it.array as Array<Any?>).toList() }
.convert { colsOf<java.sql.Struct>() }.with { (it as DuckDBStruct).map }
.convert { colsOf<Map<String, *>>() }.with { it.mapValues { listOf(it.value) }.toDataFrame().single() }
.convert { colsOf<JsonNode>() }.with { DataRow.readJsonStr(it.toString()) }
.convert { colsOf<java.time.LocalTime>() }.toLocalTime()
.convert { colsOf<java.time.LocalDate>() }.toLocalDate()
.convert { colsOf<java.time.OffsetDateTime>() }.with { it.toInstant().toKotlinInstant() }
.convert { colsOf<java.sql.Timestamp>() }.with { it.toLocalDateTime().toKotlinLocalDateTime() }
Since the compiler plugin can reason about all function invocations inside the body, it should be able to infer the combined effect of the entire convertDuckDbTypes()
function.
We might need to introduce a new annotation, like @GenericInterpretable
or simply @Interpretable
without arguments, such that the compiler plugin becomes aware of this type of function. This also allows us to provide errors when a user puts something inside that the compiler plugin cannot reason about.
Alternatively, it may be possible to recognize these types of functions automatically without annotations, but I'm not entirely sure. They would have in common that they have a DataFrame type in the input, as well as the output.
The return type of this function would also need to be different than DataFrameType_123
, as it would be a mapping from an original dataframe type to a new one.
Something like this:
@Interpretable
@Refine
fun <T : HasId> DataFrame<T>.doSomething(): DataFrame<GenericDataFrameType_123> = renameToCamelCase().convert { id }.toLong()
val df1: DataFrame<HasId> = ...
val myId: Int = df1.ID
val df2: DataFrame<HasId_1> = df1.doSomething()
val myId2: Long = df2.id
Function arguments would need special treatment as well, but they could likely be treated similarly to const
values. This could be added later, though.