-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add value_type
to Column.from_vector
and expected_value_type
to Column.map
and Column.zip
#7637
Add value_type
to Column.from_vector
and expected_value_type
to Column.map
and Column.zip
#7637
Conversation
09469dd
to
748e299
Compare
748e299
to
b2f7c30
Compare
@@ -15,6 +15,10 @@ public static TextType variableLengthWithLimit(long maxLength) { | |||
} | |||
|
|||
public boolean fits(String string) { | |||
if (string == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is null a valid representation for an empty string? It seems like this should be filtered out earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not an empty string, it's a missing string and it is valid (our columns contain null
s for missing values in a row).
h x = "_"+x.to_text | ||
r4 = c1.map h expected_value_type=(Value_Type.Char variable_length=False size=3) | ||
r4.should_fail_with Invalid_Value_Type | ||
r4.catch.to_display_text . should_contain "Expected type Char (fixed length, size=3), but got a value _1 of type Char (fixed length, size=2)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious -- why is .catch needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would likely work without it, but on principle I want to catch
the dataflow error and lift it to value and inspect its display text, not the display text of the 'error value'. It's subtle and in case it is should_contain
does not really matter. It would matter if I used should_equal
- without catch
I'd then get a Error:
prefix.
As suggested by @JaroslavTulach I've added a benchmark comparing the performance of The results were as follows:
We can see that specifying the data-type can be as much as 3x faster than relying on
The slight overhead of Curiously the timings between 16-bit and 64-bit values are comparable (16-bit part seemed to even be faster in my run, but I think it may be due to warmup, the timings are very similar). Normally, I'd expect 16-bit to be slower, because they have the added check that ensures that the integer values fit in the 16-bit numeric range. Somehow, there is no noticeable difference.
Raw results
|
If the storage is
The difference here is so huge that using VisualVM Polyglot Sampler could highlight where the time is spent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Akirathan will be happy to see the benchmark code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Pull Request Description
value_type
support tofrom_vector
,map
andzip
in in-memoryColumn
. #6111iif
orfill_nothing
is given aMixed
column, the result will also beMixed
regardless of theinferred_precise_value_type
.Important Notes
Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
./run ide build
.