Closed
Description
Code of Conduct
- I agree to follow this project's Code of Conduct
Search before asking
- I have searched in the issues and found no similar issues.
What would you like to be improved?
https://issues.apache.org/jira/browse/SPARK-47085 reported an issue, scala Seq.apply has O(n) complexity when accessing to a non-IndexedSeq with val row = rows(idx)
, in this PR 6077, the row-based TRowSet has been fixed, but TColumnGenerator.getColumnToList was missed.
In my localhost test, it will cost 150s to iterate a Hive JDBC statement resultSet (100000 rows, 20+ columns) with statement.setFetchSize(10000)
, but only took 3s with statement.setFetchSize(100)
, this is a serious performance issue.
How should we improve?
In trait TColumnGenerator.getColumnToList, convert the while loop
while (idx < rowSize) {
val row = rows(idx)
...
}
to a foreach like this
rows.foreach { row =>
....
}
will resolve this
Are you willing to submit PR?
- Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve.
- No. I cannot submit a PR at this time.
Metadata
Metadata
Assignees
Labels
No labels