-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Problem
In the TableSpec.select { row => Seq(...) } DSL, transformation functions like mapString and mapOptString cannot access other columns in the current row. The RawRow parameter exists in TransformedColumn's function signature (RawRow => Option[String] => Option[String]) but is discarded by mapString and mapOptString:
def mapString(f: String => String): TransformedColumn =
TransformedColumn(name, Lens.Direct, _ => opt => opt.map(f)) // _ discards RawRow
def mapOptString(f: Option[String] => Option[String]): TransformedColumn =
TransformedColumn(name, Lens.Direct, _ => f) // _ discards RawRowThis means transformations can only see the current column's value, not values from other columns in the same row.
Use Cases
- Conditionally anonymize a column based on another column's value
- Compose values from multiple columns
- Apply different transformations based on row context
Proposed Solution
Add new methods on SourceColumn that expose the RawRow to the transformation function, e.g.:
def mapWithRow(f: RawRow => String => String): TransformedColumn =
TransformedColumn(name, Lens.Direct, rawRow => opt => opt.map(f(rawRow)))
def mapOptWithRow(f: RawRow => Option[String] => Option[String]): TransformedColumn =
TransformedColumn(name, Lens.Direct, rawRow => f(rawRow))Files
simple-anonymizer/src/scala/simpleanonymizer/OutputColumn.scalasimple-anonymizer/src/scala/simpleanonymizer/RawRow.scalasimple-anonymizer/src/scala/simpleanonymizer/TableSpec.scala
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels