Conversation
Jackie-Jiang
commented
Feb 9, 2019
- Make Hadoop Jobs easier to extend
- Add interface ControllerRestApi to provide APIs for config fetching and segment pushing
- Add BaseSegmentJob abstract class for common segment related jobs
- In SegmentCreationJob, allow plugable mapper class and additional job properties
- In SegmentCreationMapper, allow additional segment generator configs
- Tested with Hadoop integration test
- Make Hadoop Jobs easier to extend - Add interface ControllerRestApi to provide APIs for config fetching and segment pushing - Add BaseSegmentJob abstract class for common segment related jobs - In SegmentCreationJob, allow plugable mapper class and additional job properties - In SegmentCreationMapper, allow additional segment generator configs - Tested with Hadoop integration test
Codecov Report
@@ Coverage Diff @@
## master #3813 +/- ##
============================================
- Coverage 67.1% 67.05% -0.06%
Complexity 4 4
============================================
Files 1027 1027
Lines 50819 50816 -3
Branches 7091 7091
============================================
- Hits 34102 34073 -29
- Misses 14392 14406 +14
- Partials 2325 2337 +12
Continue to review full report at Codecov.
|
|
|
||
|
|
||
| public abstract class BaseSegmentJob extends Configured { | ||
| protected final Logger _logger = LoggerFactory.getLogger(getClass()); |
There was a problem hiding this comment.
I think that we should pick either LOGGER or _logger and make them consistent. I think that we have both.
There was a problem hiding this comment.
LOGGER is static final, while _logger is non-static. The reason why this one is non-static is because we want the name shows as the actual class instead of the abstract class.
| // Optional | ||
| _depsJarDir = getPathFromProperty(JobConfigConstants.PATH_TO_DEPS_JAR); | ||
| _schemaFile = getPathFromProperty(JobConfigConstants.PATH_TO_SCHEMA); | ||
| _defaultPermissionsMask = _properties.getProperty(JobConfigConstants.DEFAULT_PERMISSIONS_MASK); |
There was a problem hiding this comment.
This will make _defaultPermissionsMask to become null if key does not exist?
There was a problem hiding this comment.
Yes, correct. We'll do null check before applying it.
| private PushLocation(PushLocationBuilder pushLocationBuilder) { | ||
| _host = pushLocationBuilder._host; | ||
| _port = pushLocationBuilder._port; | ||
| public PushLocation(String host, int port) { |
There was a problem hiding this comment.
This change may break hadoop jobs for people out side if they do something similar to PBNJ (importing pinot-hadoop jar for their hadoop job project). For now, I think that this is fine; however, it's going to be hard for us to change the constructor in very near future. Once we publish our jar files to public maven repo, this kind of change should happen in multiple steps. (Add constructor, add deprecated annotation, remove them with the major version bump).
There was a problem hiding this comment.
Good point. Once we go with official release, these kind of changes need to be out in multiple steps.
|
|
||
| // Run segment creation job | ||
| SegmentCreationJob creationJob = new SegmentCreationJob("TestSegmentCreation", properties); | ||
| SegmentCreationJob creationJob = new SegmentCreationJob(properties); |
There was a problem hiding this comment.
I think that it's better to expose JobName to be configurable. It's fine to include that within properties and read it back along with other config instead of having it as an input variable for constructor.
There was a problem hiding this comment.
Job name is inferred in the job class. Other properties should be passed through properties.