-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark Load][Bug] Keep the column splitting in spark load consistent with broker load / mini load #4532
Conversation
@@ -640,6 +642,22 @@ private StructType createScrSchema(List<String> srcColumns) { | |||
return srcSchema; | |||
} | |||
|
|||
// This method is to keep the splitting consistent with broker load / mini load | |||
private String[] splitLine(String line, char sep) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if line
is an empty string, this method should return an empty string array.
But here you will return a string array with one empty string in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…with broker load / mini load (#4532)
import csv with open('input_file.csv') as f: In this example, I have used the csv module to read the input data and split it into columns using the | delimiter. Then I will check the first and last columns of each row for extra characters and remove them if necessary. |
Proposed changes
There is a 4 columns source data:
Given the same column terminator '|', broker load determines that it is 5 columns, and spark load determines that it is 4 columns.
And there is another 4 columns source
Given the same column terminator '|', both the broker load and spark load determines that it is 4 columns.
To Reproduce
Steps to reproduce the behavior:
The reson of this bug
This is because the first character and the last character of a line are not considered to be delimeter in spark load.
Types of changes
What types of changes does your code introduce to Doris?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.