Add memory optimized dimension table#9802
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9802 +/- ##
=============================================
- Coverage 64.11% 15.80% -48.32%
+ Complexity 5385 175 -5210
=============================================
Files 1903 1923 +20
Lines 102554 103482 +928
Branches 15604 15758 +154
=============================================
- Hits 65753 16354 -49399
- Misses 32017 85936 +53919
+ Partials 4784 1192 -3592
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Shall we add a test for it?
pinot-core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTable.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/data/manager/offline/LookupRecordLocation.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/pinot/core/data/manager/offline/MemoryOptimizedDimensionTable.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/pinot/core/data/manager/offline/MemoryOptimizedDimensionTable.java
Outdated
Show resolved
Hide resolved
af6d8df to
d157124
Compare
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
Jackie-Jiang
left a comment
There was a problem hiding this comment.
LGTM. Please reformat the changes
.../src/main/java/org/apache/pinot/core/data/manager/offline/MemoryOptimizedDimensionTable.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
| if (lookupRecordLocation == null) { | ||
| return null; | ||
| } | ||
| return lookupRecordLocation.getRecord(_reuse); |
There was a problem hiding this comment.
Is this thread safe? _reuse being shared across .get calls?
There was a problem hiding this comment.
I think we need to clear the _reuse before passing it to .getRecord.
That's what we do in other places in the code.
There was a problem hiding this comment.
get can be called concurrently. Think this should be ThreadLocal<GenericRow> _reuse instead
There was a problem hiding this comment.
Good catch! I think this is a bug
Currently, we store the whole row in the Dimension table hash map for fast lookups.
However, in some cases, we can trade off the speed for memory efficiency.
In this PR, I am adding a new implementation that only stores the segment reference and docID, and the actual value is read at the time of lookup.
I have also created another design where I am using different DataManager implementations instead of DataTable implementations. It touches a lot of code paths though hence I didn't raise PR for that approach. The code can be seen here
https://github.com/KKcorps/incubator-pinot/pull/23/files
Release Notes
Disable preloading while using Dimension tables to save memory
Documentation
add the following config in tableConfig json to disable preloading
{ "dimensionTableConfig": { "disablePreload": true } }