Skip to content

initCap behavior mismatches Spark semantics #1680

@yew1eb

Description

@yew1eb

Describe the bug
The current initcap implementation uses DataFusion's initcap, which does not match Spark's semantics.
Spark's initcap function semantics (per its description):

Returns `str` with the first letter of each word in uppercase. 
All other letters are in lowercase. Words are delimited by white space.

DataFusion's initcap function semantics (per its description):

Capitalizes the first character in each word in the input string. 
Words are delimited by non-alphanumeric characters.  

#1549

To Reproduce
Run the follwing SQLs:

CREATE TABLE  tbl(id INT, txt STRING) USING parquet;
INSERT INTO tbl VALUES
(1, 'Hello_world'),
(2, 'robert rose-smith'),
(3, 'foo.bar/baz')
 ;

select id, initcap(txt) from tbl;  

Expected behavior

1,Hello_world
2,Robert Rose-smith
3,Foo.bar/baz

Actual behavior

1,Hello_World
2,Robert Rose-Smith
3,Foo.Bar/Baz

Screenshots

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions