Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow null consume in BatchPushSource #7573

Merged
merged 14 commits into from
Jul 17, 2020

Conversation

srkukarni
Copy link
Contributor

(If this PR fixes a github issue, please add Fixes #<xyz>.)

Fixes #

(or if this PR is one task of a github issue, please add Master Issue: #<xyz> to link to the master issue.)

Master Issue: #

Motivation

BatchSource records allow sources to return a null record to indicate that the batch is done.
For BatchPushSource, since we are using LinkedBlockingQueue, user's cannot simply pass a null value. Thus we need a special mechanism to indicate the end of a batch.

Modifications

Describe the modifications you've done.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API: (yes / no)
  • The schema: (yes / no / don't know)
  • The default values of configurations: (yes / no)
  • The wire protocol: (yes / no)
  • The rest endpoints: (yes / no)
  • The admin cli options: (yes / no)
  • Anything that affects deployment: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
  • If a feature is not applicable for documentation, explain why?
  • If a feature is not documented yet in this PR, please create a followup issue for adding the documentation

@srkukarni srkukarni added this to the 2.7.0 milestone Jul 17, 2020
@srkukarni srkukarni requested review from merlimat and jerrypeng July 17, 2020 05:47
@srkukarni srkukarni self-assigned this Jul 17, 2020
if (record != null) {
queue.put(record);
} else {
queue.put(new NullRecord());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small optimization: Just declare a final variable and re-use

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -32,26 +30,43 @@
*/
public abstract class BatchPushSource<T> implements BatchSource<T> {

private static class NullRecord implements Record {
Copy link
Contributor

@david-streamlio david-streamlio Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be easier to allow users to place a null value inside their existing Record class and then test for that condition rather than creating a NullRecord class? e.g.

private static final boolean isNull(Record rec) { return (rec == null) || (rec.getValue() == null); }

Then you could just call this method instead of using the instanceof NullRecord check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is, is a record with a null value the same as returning null? Can a record have a null value but have other fields e.g. key with valid values

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Null object is currently only inserted if the record passed in is null, so there aren't any fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Letting users to continue to use their own Record types and using a null value inside the record seems like a more intuitive approach to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the approach thats taken here. Users will be using their own record types and when they are done with the task, will do consume(null) to signify the end of the task. The NullRecord is strictly private class as an internal impl detail ofBatchPushSource

@srkukarni srkukarni merged commit 4e1a677 into apache:master Jul 17, 2020
@srkukarni srkukarni deleted the allow_null_consume branch July 17, 2020 22:19
merlimat pushed a commit to merlimat/pulsar that referenced this pull request Jul 21, 2020
* Added upgrade notes

* Allow null message to be passed

* More private impl

* Fix unittest

* Address comments

Co-authored-by: Sanjeev Kulkarni <sanjeevk@splunk.com>
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Aug 24, 2020
* Added upgrade notes

* Allow null message to be passed

* More private impl

* Fix unittest

* Address comments

Co-authored-by: Sanjeev Kulkarni <sanjeevk@splunk.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants