Skip to content

Allow select function to read source column values #3

@nafg

Description

@nafg

Problem

In the TableSpec.select { row => Seq(...) } DSL, transformation functions like mapString and mapOptString cannot access other columns in the current row. The RawRow parameter exists in TransformedColumn's function signature (RawRow => Option[String] => Option[String]) but is discarded by mapString and mapOptString:

def mapString(f: String => String): TransformedColumn =
  TransformedColumn(name, Lens.Direct, _ => opt => opt.map(f))  // _ discards RawRow

def mapOptString(f: Option[String] => Option[String]): TransformedColumn =
  TransformedColumn(name, Lens.Direct, _ => f)  // _ discards RawRow

This means transformations can only see the current column's value, not values from other columns in the same row.

Use Cases

  • Conditionally anonymize a column based on another column's value
  • Compose values from multiple columns
  • Apply different transformations based on row context

Proposed Solution

Add new methods on SourceColumn that expose the RawRow to the transformation function, e.g.:

def mapWithRow(f: RawRow => String => String): TransformedColumn =
  TransformedColumn(name, Lens.Direct, rawRow => opt => opt.map(f(rawRow)))

def mapOptWithRow(f: RawRow => Option[String] => Option[String]): TransformedColumn =
  TransformedColumn(name, Lens.Direct, rawRow => f(rawRow))

Files

  • simple-anonymizer/src/scala/simpleanonymizer/OutputColumn.scala
  • simple-anonymizer/src/scala/simpleanonymizer/RawRow.scala
  • simple-anonymizer/src/scala/simpleanonymizer/TableSpec.scala

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions