[feature] allow dim table config to detect/disallow duplicate PK #12290
[feature] allow dim table config to detect/disallow duplicate PK #12290walterddr merged 3 commits intoapache:masterfrom
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #12290 +/- ##
============================================
- Coverage 61.58% 61.57% -0.01%
- Complexity 1152 1153 +1
============================================
Files 2415 2417 +2
Lines 131476 131611 +135
Branches 20303 20315 +12
============================================
+ Hits 80963 81046 +83
- Misses 44597 44629 +32
- Partials 5916 5936 +20
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
| private final AtomicInteger _loadToken = new AtomicInteger(); | ||
|
|
||
| private boolean _disablePreload = false; | ||
| private boolean _allowDuplicatePrimaryKey = true; |
There was a problem hiding this comment.
Using true as default boolean config might be error prone. Consider changing it to errorOnDuplicatePrimaryKey and default to false?
There was a problem hiding this comment.
good point. will fix
use false as default for disallowed config key; improve test
| private final AtomicInteger _loadToken = new AtomicInteger(); | ||
|
|
||
| private boolean _disablePreload = false; | ||
| private boolean _disallowDuplicatePrimaryKey = false; |
There was a problem hiding this comment.
Still suggest naming it _errorOnDuplicatePrimaryKey. We shouldn't allow duplicate primary key in any case since it will result in inconsistent behavior.
- PR apache#12290 touched dimBaseballTeams.csv on which the tests depends on. Modified the test to reflect that. - Changed import statements (addressed comment).
… Disk Issues (#12220) * modularized map, reduce and segment generation phase. * added interfaces and classes * added staetefulrecordreaderfileconfig instead of recordreaderfileconfig * save Progress * major changes, working tests. * more changes * made the number of bytes to be configurable * added global sequence id to segments since it is modularised out, made the intermediate file size configurable through segmentsConfig. * remove logic for global segment IDs. * used recordreaderconfig instead of statefulrecordreaderconfig, changed looping conditions, made recordreader of recordreaderfileconfig modifiable. * added proper sequence ids for segments. * ingestion working without errors. Should be good in sunny day scenarios. * remove typo from testing * replace _partitionToFilemanagerMap with local variable asthe mapping is happening in multiple iterations. Change exception handling for mapper. * added precondition check for intermediateFileSizeThresholdInBytes and inferred observer from segmentsConfig * fix typo and add comment. * removed redundant method. * config related changes, fix unit tests. * remove redundant flag. * replaced size based constraint checker with size based constraint writer. * consolidated all the constraint checks into one place. * add test to validate segmentprocessorframework * decoupled recordreader closing logic from segment mapper * addressing comments related to segmentprocessorframework. * delegated initialisation and check for recordreader being done with all the records to recordreaderfileconfig. * simplify logic for termination of map phase. * change variable names for better readability. * Added tests and minor changes in logic - Added further tests for SegmentProcessorFramework. - Removed redundant checks for SegmentProcessorFramework. - Detected an edge case and added an additional statement in SegmentMapper. * isolated the termination logs to mapAndTransformRow. * Keep SegmentMapper public interface unchanged - Pass the total count of RecordReaders through SegmentMapper Constructor instead of map() arguments. * addressing review comments. * cleaned up logging logic. * Channged Logs * kept the logging for default scenario same as before * conditionally logged the progress between iterations in case the feature is enabled. * modify tests - PR #12290 touched dimBaseballTeams.csv on which the tests depends on. Modified the test to reflect that. - Changed import statements (addressed comment).
this addresses: #12284