Skip to content

[SPARK-27494][SS] Null keys/values don't work in Kafka source v2 #24441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

uncleGen
Copy link
Contributor

@uncleGen uncleGen commented Apr 23, 2019

What changes were proposed in this pull request?

Right now Kafka source v2 doesn't support null keys or values.

  • When processing a null key, all of the following keys in the same partition will be null. This is a correctness bug.
  • When processing a null value, it will throw NPE.

How was this patch tested?

add new unit tests

@uncleGen uncleGen changed the title Null values don't work in Kafka source v2 [SPARK-27494][SS] Null values don't work in Kafka source v2 Apr 23, 2019
@SparkQA
Copy link

SparkQA commented Apr 23, 2019

Test build #104829 has finished for PR 24441 at commit 5cb9502.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@uncleGen
Copy link
Contributor Author

pending on adding new unit test

@SparkQA
Copy link

SparkQA commented Apr 24, 2019

Test build #104852 has finished for PR 24441 at commit 74bfb87.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 24, 2019

Test build #104858 has finished for PR 24441 at commit aadad55.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@uncleGen
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Apr 24, 2019

Test build #104864 has finished for PR 24441 at commit aadad55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

good catch!

@SparkQA
Copy link

SparkQA commented Apr 25, 2019

Test build #104885 has finished for PR 24441 at commit b6e9e31.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 25, 2019

Test build #104886 has finished for PR 24441 at commit 2f30711.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 25, 2019

Test build #104889 has finished for PR 24441 at commit 9dd38eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 25, 2019

Test build #104891 has finished for PR 24441 at commit 45330de.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 25, 2019

Test build #104893 has finished for PR 24441 at commit d91977d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Although SPARK-27494 is reported on only v2, can we have a test coverage for Kafka source v1 too?

@uncleGen
Copy link
Contributor Author

uncleGen commented Apr 25, 2019

@dongjoon-hyun make sense, what about finishing it in a follow-up pr? v1 is already tested in KafkaMicroBatchV1SourceSuite

@cloud-fan
Copy link
Contributor

@dongjoon-hyun v1 is already tested because the test is in KafkaMicroBatchSourceSuiteBase

@uncleGen uncleGen closed this Apr 25, 2019
@uncleGen uncleGen reopened this Apr 25, 2019
Copy link
Contributor

@gaborgsomogyi gaborgsomogyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor things found, basically looks good.

@dongjoon-hyun
Copy link
Member

Got it. Thanks, @uncleGen and @cloud-fan !

@SparkQA
Copy link

SparkQA commented Apr 26, 2019

Test build #104920 has finished for PR 24441 at commit d2ffea1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 26, 2019

Test build #104921 has finished for PR 24441 at commit 174321a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@uncleGen
Copy link
Contributor Author

@cloud-fan Take a second glance please.

Copy link
Member

@zsxwing zsxwing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cloud-fan cloud-fan closed this in d2656aa Apr 26, 2019
cloud-fan pushed a commit that referenced this pull request Apr 26, 2019
## What changes were proposed in this pull request?

Right now Kafka source v2 doesn't support null values. The issue is in org.apache.spark.sql.kafka010.KafkaRecordToUnsafeRowConverter.toUnsafeRow which doesn't handle null values.

## How was this patch tested?

add new unit tests

Closes #24441 from uncleGen/SPARK-27494.

Authored-by: uncleGen <hustyugm@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit d2656aa)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan
Copy link
Contributor

thanks, merging to master/2.4!

@dongjoon-hyun
Copy link
Member

Thank you all!

@zsxwing zsxwing changed the title [SPARK-27494][SS] Null values don't work in Kafka source v2 [SPARK-27494][SS] Null keys/values don't work in Kafka source v2 Apr 26, 2019
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
## What changes were proposed in this pull request?

Right now Kafka source v2 doesn't support null values. The issue is in org.apache.spark.sql.kafka010.KafkaRecordToUnsafeRowConverter.toUnsafeRow which doesn't handle null values.

## How was this patch tested?

add new unit tests

Closes apache#24441 from uncleGen/SPARK-27494.

Authored-by: uncleGen <hustyugm@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit d2656aa)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 25, 2019
## What changes were proposed in this pull request?

Right now Kafka source v2 doesn't support null values. The issue is in org.apache.spark.sql.kafka010.KafkaRecordToUnsafeRowConverter.toUnsafeRow which doesn't handle null values.

## How was this patch tested?

add new unit tests

Closes apache#24441 from uncleGen/SPARK-27494.

Authored-by: uncleGen <hustyugm@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit d2656aa)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
## What changes were proposed in this pull request?

Right now Kafka source v2 doesn't support null values. The issue is in org.apache.spark.sql.kafka010.KafkaRecordToUnsafeRowConverter.toUnsafeRow which doesn't handle null values.

## How was this patch tested?

add new unit tests

Closes apache#24441 from uncleGen/SPARK-27494.

Authored-by: uncleGen <hustyugm@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit d2656aa)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants