-
Notifications
You must be signed in to change notification settings - Fork 707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support nesting Options in TypeDescriptor #1387
Conversation
One question: should we change TupleConvertor on String to do: val tmp = tup.getString(idx)
if (tmp == null) "" else tmp Since I think in some cases, cascading will parse empty strings in text delimited into null (when the type of the field is Object I think, which may happen now with nested Options). |
Another way to deal with the empty strings parsing as null would be to override the cleanSplit method in Cascading's DelimitedParser (by changing this method: https://github.com/cwensel/cascading/blob/wip-3.0/cascading-core/src/main/java/cascading/scheme/util/DelimitedParser.java#L281). I'm not sure if changing TupleConverter or DelimitedParser would be better. |
@sid-kap yeah, it's not clear to me either. Generally we change code we have control over (in this case the TupleConverter macros) but not 100% sure the best approach here. I think the current approach is fine, but it does convolute |
|
||
Left(expanded) | ||
object FieldBuilder { | ||
// This is method on the object to work around this compiler bug: SI-6231 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That bug was fixed for 2.10.0 by the link ?
LGTM other than i think we should use the strategy from bijection to enforce the macro doesn't work in those cases |
Support nesting Options in TypeDescriptor
I wonder if we should add a type: trait TypedTextDescriptor[T] extends TypeDescriptor[T]
object TypedTextDescriptor {
implicit def fromMacro[T]: // this assumes null strings are the same as "" on read, disallows Option[String], which is supported in databases.
} |
This changes does a few things:
Option[String]
orOption[Option[Option[Int]]]
which just flatten on the way out. Reading is ambiguous, but there seems to be demand for this feature when writing data out to be read by python, R, etc...Option[(Int, Option[Long])]
and similar. The existence checking is also more efficient: we only check one column for each option (the first "evident" column we can find).