Skip to content

Use StringViewArray as output of substr when input was StringArray #12338

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Part of #11752

StringView is a new arrow array type that allows for more efficient string processing -- specifically it allows string data to be adjusted without copying the underlying data

See this blog post for more details: https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/

@Kev1n8 added support for StringView to the substr function in #12044

At the moment substr produces a StringArray output when the input is StringArray, but we could actually generate a StringViewArray as output which would be more efficient in most cases (avoids copying the string values)

However, in order to avoid errors when substr is used in an expression, we need to make sure that all the rest of the String functions support StringView as input as well. Aka we should wait for the "Required for enabling StringView by default" list on #11752 to be completed

Describe the solution you'd like

  1. change the output type of substr to be StringViewArray when the input is StringArray (note for LargeStringArray we will still need to copy the data I think as StringView is limited to 2^32 bytes)
  2. Change the implementation of substr to use StringView internally
  3. Add tests

Describe alternatives you've considered

No response

Additional context

Note that @kevin8 has already added support for StringView to the substr function in #12044

They also suggested this same optimization could be applied #12044 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions