Skip to content

Refactor Hadoop Jobs#3813

Merged
Jackie-Jiang merged 1 commit intomasterfrom
refactor_hadoop
Feb 12, 2019
Merged

Refactor Hadoop Jobs#3813
Jackie-Jiang merged 1 commit intomasterfrom
refactor_hadoop

Conversation

@Jackie-Jiang
Copy link
Contributor

  • Make Hadoop Jobs easier to extend
  • Add interface ControllerRestApi to provide APIs for config fetching and segment pushing
  • Add BaseSegmentJob abstract class for common segment related jobs
  • In SegmentCreationJob, allow plugable mapper class and additional job properties
  • In SegmentCreationMapper, allow additional segment generator configs
  • Tested with Hadoop integration test

- Make Hadoop Jobs easier to extend
- Add interface ControllerRestApi to provide APIs for config fetching and segment pushing
- Add BaseSegmentJob abstract class for common segment related jobs
- In SegmentCreationJob, allow plugable mapper class and additional job properties
- In SegmentCreationMapper, allow additional segment generator configs
- Tested with Hadoop integration test
@codecov-io
Copy link

Codecov Report

Merging #3813 into master will decrease coverage by 0.05%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #3813      +/-   ##
============================================
- Coverage      67.1%   67.05%   -0.06%     
  Complexity        4        4              
============================================
  Files          1027     1027              
  Lines         50819    50816       -3     
  Branches       7091     7091              
============================================
- Hits          34102    34073      -29     
- Misses        14392    14406      +14     
- Partials       2325     2337      +12
Impacted Files Coverage Δ Complexity Δ
...e/pinot/common/utils/FileUploadDownloadClient.java 69.62% <100%> (-0.67%) 0 <0> (ø)
...ot/core/data/readers/ThriftRecordReaderConfig.java 80% <100%> (ø) 0 <0> (ø) ⬇️
...indexsegment/generator/SegmentGeneratorConfig.java 60.94% <100%> (+6.06%) 0 <0> (ø) ⬇️
...a/manager/realtime/RealtimeSegmentDataManager.java 50% <0%> (-50%) 0% <0%> (ø)
...egation/function/customobject/MinMaxRangePair.java 75.86% <0%> (-24.14%) 0% <0%> (ø)
...r/validation/RealtimeSegmentValidationManager.java 42.85% <0%> (-14.29%) 0% <0%> (ø)
...impl/dictionary/DoubleOnHeapMutableDictionary.java 68.88% <0%> (-13.34%) 0% <0%> (ø)
...pinot/core/operator/docidsets/OrBlockDocIdSet.java 84.9% <0%> (-13.21%) 0% <0%> (ø)
...impl/dictionary/FloatOffHeapMutableDictionary.java 81.81% <0%> (-10.91%) 0% <0%> (ø)
...esthandler/ConnectionPoolBrokerRequestHandler.java 77.77% <0%> (-10.5%) 0% <0%> (ø)
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ac058c5...e90ffd5. Read the comment docs.

Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round



public abstract class BaseSegmentJob extends Configured {
protected final Logger _logger = LoggerFactory.getLogger(getClass());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should pick either LOGGER or _logger and make them consistent. I think that we have both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOGGER is static final, while _logger is non-static. The reason why this one is non-static is because we want the name shows as the actual class instead of the abstract class.

// Optional
_depsJarDir = getPathFromProperty(JobConfigConstants.PATH_TO_DEPS_JAR);
_schemaFile = getPathFromProperty(JobConfigConstants.PATH_TO_SCHEMA);
_defaultPermissionsMask = _properties.getProperty(JobConfigConstants.DEFAULT_PERMISSIONS_MASK);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make _defaultPermissionsMask to become null if key does not exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct. We'll do null check before applying it.

private PushLocation(PushLocationBuilder pushLocationBuilder) {
_host = pushLocationBuilder._host;
_port = pushLocationBuilder._port;
public PushLocation(String host, int port) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change may break hadoop jobs for people out side if they do something similar to PBNJ (importing pinot-hadoop jar for their hadoop job project). For now, I think that this is fine; however, it's going to be hard for us to change the constructor in very near future. Once we publish our jar files to public maven repo, this kind of change should happen in multiple steps. (Add constructor, add deprecated annotation, remove them with the major version bump).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Once we go with official release, these kind of changes need to be out in multiple steps.


// Run segment creation job
SegmentCreationJob creationJob = new SegmentCreationJob("TestSegmentCreation", properties);
SegmentCreationJob creationJob = new SegmentCreationJob(properties);
Copy link
Contributor

@snleee snleee Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it's better to expose JobName to be configurable. It's fine to include that within properties and read it back along with other config instead of having it as an input variable for constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Job name is inferred in the job class. Other properties should be passed through properties.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants