Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-368]Imporve performance of dataframe loading #278

Closed
wants to merge 4 commits into from
Closed

[CARBONDATA-368]Imporve performance of dataframe loading #278

wants to merge 4 commits into from

Conversation

QiangCai
Copy link
Contributor

@QiangCai QiangCai commented Nov 2, 2016

  1. support concurrently reading dataframe in CSVInput step

  2. fix bug for reading dataframe in kettle thread

  3. customize DataLoadPartitionCoalescer to repartition input DataFrame without shuffle.

@QiangCai QiangCai changed the title [CARBONDATA-85][WIP] support insert into carbon table select from table [CARBONDATA-368]Imporve performance of dataframe loading Nov 3, 2016
@jackylk
Copy link
Contributor

jackylk commented Nov 3, 2016

Some testcase failed after rebase to master

@QiangCai
Copy link
Contributor Author

QiangCai commented Nov 9, 2016

I have fix test case error.
now CI PASS
http://136.243.101.176:8080/job/ApacheCarbonManualPRBuilder/595/

format: SimpleDateFormat,
level: Int = 1): String = {
value == null match {
case true => serializationNullFormat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use if else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@jackylk
Copy link
Contributor

jackylk commented Nov 19, 2016

please rebase

@QiangCai
Copy link
Contributor Author

Rebase done

@QiangCai
Copy link
Contributor Author

QiangCai and others added 3 commits November 28, 2016 21:18
DataLoadPartitionCoalescer

concurrently read dataframe
@@ -548,77 +552,53 @@ class DataFrameLoaderRDD[K, V](
override protected def getPartitions: Array[Partition] = firstParent[Row].partitions
}

class PartitionIterator(partitionIter: Iterator[DataLoadPartitionWrap[Row]],
carbonLoadModel: CarbonLoadModel,
context: TaskContext) extends JavaRddIterator[JavaRddIterator[Array[String]]] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra space

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -932,7 +942,8 @@ object CarbonDataRDDFactory {
loadDataFile()
}
val newStatusMap = scala.collection.mutable.Map.empty[String, String]
status.foreach { eachLoadStatus =>
if (status.nonEmpty) {
status.foreach { eachLoadStatus =>
val state = newStatusMap.get(eachLoadStatus._1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incorrect indentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}
return null;
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove ;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@jackylk
Copy link
Contributor

jackylk commented Nov 29, 2016

@asfgit asfgit closed this in 567fa51 Nov 29, 2016
@QiangCai QiangCai deleted the loaddataframe branch May 12, 2017 01:52
Beyyes pushed a commit to Beyyes/carbondata that referenced this pull request Jul 12, 2018
Move graf‘s json.template to bootstrap
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants