-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-458]Improving First time query performance #265
[CARBONDATA-458]Improving First time query performance #265
Conversation
25b1653
to
ccf00b8
Compare
dc1ff07
to
eacc8d4
Compare
eacc8d4
to
668de2d
Compare
668de2d
to
1d1b4da
Compare
@@ -74,7 +74,7 @@ | |||
* @param blockIndexes indexes of the blocks need to be read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add more description of blockIndexes, it is two dimensional array, what does each dimension mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
this.filePath = FileFactory.getUpdatedFilePath(filePath); | ||
this.blockOffset = blockOffset; | ||
this.segmentId = segmentId; | ||
this.locations = locations; | ||
this.blockLength = blockLength; | ||
this.blockletInfos = blockletInfos; | ||
this.version = version; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somewhere should check the validity of the version, where it is checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validateCarbonDataFileVersion method is CrabonProperties I am validating the version
public DimensionColumnChunkReader getDimensionColumnChunkReader(short version, | ||
BlockletInfo blockletInfo, int[] eachColumnValueSize, String filePath) { | ||
switch (version) { | ||
case 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should have enum for version
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
package org.apache.carbondata.core.carbon.datastore.chunk.reader.dimension; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we create a different package for V2 reader? so that V1 and V2 package is separated, I think it will be more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I will move and rename all the classes as per your suggestion
/** | ||
* Compressed dimension chunk reader class for version 2 | ||
*/ | ||
public class CompressedDimensionChunkFileBasedReader2 extends AbstractChunkReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to CompressedDimensionChunkFileBasedReaderV2
System.arraycopy(data, copySourcePoint, compressedIndexPage, 0, | ||
dimensionColumnChunk.rowid_page_length); | ||
copySourcePoint += dimensionColumnChunk.rowid_page_length; | ||
invertedIndexes = CarbonUtil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move CarbonUtil
to next line and break at paramenters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
import org.apache.carbondata.core.carbon.metadata.blocklet.datachunk.DataChunk; | ||
import org.apache.carbondata.core.datastorage.store.compression.ValueCompressionModel; | ||
import org.apache.carbondata.core.datastorage.store.compression.ValueCompressonHolder; | ||
import org.apache.carbondata.core.datastorage.store.compression.ValueCompressonHolder.UnCompressValue; | ||
|
||
/** | ||
* Measure block reader abstract class | ||
*/ | ||
public abstract class AbstractMeasureChunkReader implements MeasureColumnChunkReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this class is not implementing any interface function of MeasureColumnChunkReader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it is not implementing any method but in future it may have some common method
/** | ||
* Class to read the measure column data for version 2 | ||
*/ | ||
public class CompressedMeasureChunkFileReader2 extends AbstractMeasureChunkReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rename all V2 class to end with V2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
byte[] data = null; | ||
byte[] dataPage = null; | ||
if (measureColumnChunkOffsets.size() - 1 == blockIndex) { | ||
measureDataChunk = fileReader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move fileReader to next line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -38,7 +38,7 @@ | |||
/** | |||
* version used for data compatibility | |||
*/ | |||
private int versionId; | |||
private short versionId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change version from int to short?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
short range is enough to store version. Please let me know if need to store in int is required
/** | ||
* current data file version | ||
*/ | ||
public static final short CARBON_DATA_FILE_CURRENT_VERSION = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not easy to understand what current means, can you change to a more meaning name, and use enum for version number.
The default version is 1 or 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i will change it to some meaning full name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change 2 to enum also
/** | ||
* If the level 2 compaction is done in minor then new compacted segment will end with .2 | ||
*/ | ||
public static String LEVEL2_COMPACTION_INDEX = ".2"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need this in this PR, is't it for compaction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By minstake i have checked in, will remove this constant
case DIRECT_DICTIONARY: | ||
return Encoding.DIRECT_DICTIONARY; | ||
default: | ||
return Encoding.DICTIONARY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this really default or we should throw exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will fix
please add testcase for:
|
@jackylk all the existing test cases are running with V2 and some I have updated with V1 version. ok I will add test cases for reading V1 version file |
* if parameter is invalid current version will be set | ||
*/ | ||
private void validateCarbonDataFileVersion() { | ||
short carbondataFileVersion = CarbonCommonConstants.CARBON_DATA_FILE_CURRENT_VERSION; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this initial value is of no use
.setProperty(CarbonCommonConstants.CARBON_DATA_FILE_VERSION, carbondataFileVersion + ""); | ||
} | ||
if (carbondataFileVersion > CarbonCommonConstants.CARBON_DATA_FILE_CURRENT_VERSION | ||
|| carbondataFileVersion < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This checking is not correct, we should also 1 and 2, but not 0, 1, 2
configuration.set(CarbonInputFormat.INPUT_SEGMENT_NUMBERS, | ||
CarbonUtil.getSegmentString(validSegments)); | ||
configuration | ||
.set(CarbonInputFormat.INPUT_SEGMENT_NUMBERS, CarbonUtil.getSegmentString(validSegments)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break the line at parameter
.set(CarbonInputFormat.INPUT_SEGMENT_NUMBERS, CarbonUtil.getSegmentString(validSegments)); | ||
} | ||
|
||
private static AbsoluteTableIdentifier getAbsoluteTableIdentifier(Configuration configuration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to getIdentifier
to make it shorter
@@ -193,8 +203,7 @@ public static void setSegmentsToAccess(Configuration configuration, List<String> | |||
* @return List<InputSplit> list of CarbonInputSplit | |||
* @throws IOException | |||
*/ | |||
@Override | |||
public List<InputSplit> getSplits(JobContext job) throws IOException { | |||
@Override public List<InputSplit> getSplits(JobContext job) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move Override
to previous line
e1ca8c8
to
c7a6bc1
Compare
Note |
c7a6bc1
to
7a913d9
Compare
LGTM |
Improving carbon first time query performance
Reason:
Solution:
Note
Since there is change in schema.thrift and hence carbon-format jar has to be build and deployed.