Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature](datatype) Add IPv4/v6 data type for doris #24965

Merged
merged 14 commits into from
Oct 26, 2023

Conversation

sjyango
Copy link
Contributor

@sjyango sjyango commented Sep 27, 2023

Proposed changes

Issue Number: close #21370

Describe your changes

Now We do not support IPv4/v6 data type for doris witch is common data type in database
So it's the time to support it ! We have some basic funnctions for data type to check the data type is supported completly

  • support CURD

  • support StreamLoad

  • support Serde functions

  • support (implicit/explicit) cast functions. The following cast functions have been implemented:

  1. uint8/int8 -> ipv4
  2. uint16/int16 -> ipv4
  3. uint32/int32 -> ipv4
  4. uint64/int64 -> ipv4
  5. string -> ipv4
  6. string -> ipv6
  • specific functions for itself data type (u can reference clickhouse or mysql)

  • make a Performance Test result with clickhouse or ElasticSearch in load this data type

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@sjyango
Copy link
Contributor Author

sjyango commented Sep 27, 2023

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

There were too many comments to post at once. Showing the first 25 out of 38. Check the log or trigger a new build to see more.

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.20% (8141/22491)
Line Coverage: 28.41% (65188/229461)
Region Coverage: 27.37% (33809/123530)
Branch Coverage: 24.07% (17288/71816)
Coverage Report: http://coverage.selectdb-in.cc/coverage/0074eb56e95fc6d7597bde9ebf9b82d6f469fc40_0074eb56e95fc6d7597bde9ebf9b82d6f469fc40/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.89 seconds
stream load tsv: 579 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.3 seconds inserted 10000000 Rows, about 341K ops/s
storage size: 17162311186 Bytes

@sjyango
Copy link
Contributor Author

sjyango commented Sep 28, 2023

run p0

Copy link
Contributor

@amorynan amorynan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@amorynan
Copy link
Contributor

@xiaokang @yiguolei @morrySnow please review this

@morrySnow
Copy link
Contributor

please add rule descriptions for conversions with other types

@sjyango
Copy link
Contributor Author

sjyango commented Sep 28, 2023

please add rule descriptions for conversions with other types

What is the format of rule descriptions for conversions? Are there any relevant examples?

@starocean999
Copy link
Contributor

starocean999 commented Sep 28, 2023

In general, there are more work need to do for nereids(at least):

  1. Add literal definition of IPV4 and IPV6 like old planner
  2. modify fe/fe-core/src/main/java/org/apache/doris/nereids/types/DataType.java for IP types
  3. modify fe/fe-core/src/main/java/org/apache/doris/nereids/util/TypeCoercionUtils.java for IP types
  4. add 'set enable_nereids_planner=true' and 'enable_fallback_to_original_planner=false' in groovy regression test cases and verify the result

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 28, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

starocean999

This comment was marked as duplicate.

Copy link
Contributor

@starocean999 starocean999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, there are more work need to do for nereids(at least):

Add literal definition of IPV4 and IPV6 like old planner
modify fe/fe-core/src/main/java/org/apache/doris/nereids/types/DataType.java for IP types
modify fe/fe-core/src/main/java/org/apache/doris/nereids/util/TypeCoercionUtils.java for IP types
add 'set enable_nereids_planner=true' and 'enable_fallback_to_original_planner=false' in groovy regression test cases and verify the result

@sjyango
Copy link
Contributor Author

sjyango commented Sep 28, 2023

In general, there are more work need to do for nereids(at least):

  1. Add literal definition of IPV4 and IPV6 like old planner
  2. modify fe/fe-core/src/main/java/org/apache/doris/nereids/types/DataType.java for IP types
  3. modify fe/fe-core/src/main/java/org/apache/doris/nereids/util/TypeCoercionUtils.java for IP types
  4. add 'set enable_nereids_planner=true' and 'enable_fallback_to_original_planner=false' in groovy regression test cases and verify the result

Thank you for reviewing my code, I will complete this work during the National Day!

starocean999

This comment was marked as duplicate.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Sep 28, 2023
@sjyango
Copy link
Contributor Author

sjyango commented Oct 17, 2023

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

There were too many comments to post at once. Showing the first 25 out of 191. Check the log or trigger a new build to see more.

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.88% (8282/22457)
Line Coverage: 28.99% (66449/229242)
Region Coverage: 27.67% (34481/124594)
Branch Coverage: 24.35% (17557/72110)
Coverage Report: http://coverage.selectdb-in.cc/coverage/ac31834c8057e656af9e9be6c3125c341e7efb04_ac31834c8057e656af9e9be6c3125c341e7efb04/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.25 seconds
stream load tsv: 556 seconds loaded 74807831229 Bytes, about 128 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17161923223 Bytes

@sjyango
Copy link
Contributor Author

sjyango commented Oct 18, 2023

run p0

sql """ SET enable_fallback_to_original_planner=false """

sql """
CREATE TABLE test_unique_ip_crud (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for dml statement, even 'SET enable_fallback_to_original_planner=false', the nereids will fallback to old planner if meets any error. So to support ipv4 and ipv6 keyword in nereids, we need modiy the g4 grammer files:
src/main/antlr4/org/apache/doris/nereids/DorisLexer.g4
src/main/antlr4/org/apache/doris/nereids/DorisParser.g4
and check visitCreateTable in org/apache/doris/nereids/parser/LogicalPlanBuilder.java works

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkValueValid is missing in nereids, shoud it be consistent with old planner?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkValueValid is missing in nereids, shoud it be consistent with old planner?

@sjyango
Copy link
Contributor Author

sjyango commented Oct 26, 2023

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.49 seconds
stream load tsv: 557 seconds loaded 74807831229 Bytes, about 128 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17162053796 Bytes

@sjyango
Copy link
Contributor Author

sjyango commented Oct 26, 2023

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

There were too many comments to post at once. Showing the first 25 out of 95. Check the log or trigger a new build to see more.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

There were too many comments to post at once. Showing the first 25 out of 71. Check the log or trigger a new build to see more.

@sjyango
Copy link
Contributor Author

sjyango commented Oct 26, 2023

run clickbench-new

@sjyango
Copy link
Contributor Author

sjyango commented Oct 26, 2023

run p0

@sjyango
Copy link
Contributor Author

sjyango commented Oct 26, 2023

run p1

@sjyango
Copy link
Contributor Author

sjyango commented Oct 26, 2023

run p0

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.18 seconds
stream load tsv: 557 seconds loaded 74807831229 Bytes, about 128 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.1 seconds inserted 10000000 Rows, about 343K ops/s
storage size: 17162106699 Bytes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. meta-change reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature](datatype) Add IPv4/v6 data type for doris
5 participants