-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix performance regression when hive SerDe doesn't prefer Writables #15163
Fix performance regression when hive SerDe doesn't prefer Writables #15163
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid DateTimeZone.getDefault() in GenericHiveRecordCursor hot path
LGTM
I see too much Slice object creation in these code but they were like that from long time ago. Not to say we have to additionally improve it by avoid creating Slice's and memory allocations in this PR, but it's certainly something to explore in near future.
presto-hive/src/main/java/com/facebook/presto/hive/GenericHiveRecordCursor.java
Outdated
Show resolved
Hide resolved
Introduced in prestodb#8206, GenericHiveRecordCursor was modified to avoid extra overhead when the SerDe provided a more efficient String handling implementation with Writables. However, when the SerDe does not provide such an implementation and instead already returned String instances directly, this change introduced an extra conversion from bytes to String just to be converted back to bytes. This change alters the behavior of GenericHiveRecordCursor parseString to respect the PrimitiveObjectInspector's preference for using writables.
5c79a2a
to
2bad2ed
Compare
Which copy are you seeing that we can avoid? The only avoidable one I see is this one https://github.com/prestodb/presto/pull/15163/files#diff-dc300db27cccd3a6d61eb547021170d6R432 to copy the contents if trimming the string down to character limits resulted in a smaller string, but without it you could have large buffers being retained by much smaller strings similar to |
@yingsu00 anything else we need to do here to be mergeable? |
I'm looking at removing the usage of Slice at all. The overhead creating Slices's is too high especially when the payload size is small. In our production workload we've seen it causing many GC issues. But it's not the scope of this PR. I'll accept it. |
@mbasmanova Do you want to review for a second round of this PR? |
Introduced in #8206
GenericHiveRecordCursor
was modified to avoid extra overhead when the SerDe provided a more efficient String handling implementation with Writables. However, when the SerDe does not provide such an implementation and instead already returned String instances directly, this change introduced an extra conversion from bytes to String just to be converted back to bytes.This change alters the behavior of
GenericHiveRecordCursor#parseString
to respect the PrimitiveObjectInspector's preference for using writables.