ARROW-1780 - JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects #1759

atuldambalkar · 2018-03-15T20:35:08Z

This code enhancement is for converting JDBC ResultSet Relational objects to Arrow columnar data Vector objects. Code is under director "java/adapter/jdbc/src/main".

The API has following static methods in the

class org.apache.arrow.adapter.jdbc.JdbcToArrow -

public static VectorSchemaRoot sqlToArrow(Connection connection, String query)
public static ArrowDataFetcher jdbcArrowDataFetcher(Connection connection, String tableName)

Utility uses following data mapping to convert JDBC/SQL data types to Arrow data types -
CHAR --> ArrowType.Utf8
NCHAR --> ArrowType.Utf8
VARCHAR --> ArrowType.Utf8
NVARCHAR --> ArrowType.Utf8
LONGVARCHAR --> ArrowType.Utf8
LONGNVARCHAR --> ArrowType.Utf8
NUMERIC --> ArrowType.Decimal(precision, scale)
DECIMAL --> ArrowType.Decimal(precision, scale)
BIT --> ArrowType.Bool
TINYINT --> ArrowType.Int(8, signed)
SMALLINT --> ArrowType.Int(16, signed)
INTEGER --> ArrowType.Int(32, signed)
BIGINT --> ArrowType.Int(64, signed)
REAL --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
FLOAT --> ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)
DOUBLE --> ArrowType.FloatingPoint(FloatingPointPrecision.DOUBLE)
BINARY --> ArrowType.Binary
VARBINARY --> ArrowType.Binary
LONGVARBINARY --> ArrowType.Binary
DATE --> ArrowType.Date(DateUnit.MILLISECOND)
TIME --> ArrowType.Time(TimeUnit.MILLISECOND, 32)
TIMESTAMP --> ArrowType.Timestamp(TimeUnit.MILLISECOND, timezone=null)
CLOB --> ArrowType.Utf8
BLOB --> ArrowType.Binary

JUnit test cases under java/adapter/jdbc/src/test. Test cases uses H2 in-memory database.

I am still working on adding and automating additional test cases.

…or objects creation.

Used to YAML for the test data. Fixed issues in the vector creation for CLOB.

Fixed code to handle only one column in select query.

Test

…or objects creation.

Used to YAML for the test data. Fixed issues in the vector creation for CLOB.

Fixed code to handle only one column in select query.

Added new test file

Conflicts: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/JdbcToArrowTest.java

…ush in forked branch

Pull Request created for merging Code Coverage related changes

laurentgo

You might want to checkstyle results: it looks like the way the code is indented is pretty different from the rest of the arrow code base...

laurentgo · 2018-05-15T17:04:09Z

java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java

+    public static VectorSchemaRoot sqlToArrow(ResultSet resultSet) throws SQLException, IOException {
+        Preconditions.checkNotNull(resultSet, "JDBC ResultSet object can not be null");
+
+        return sqlToArrow(resultSet, Calendar.getInstance());


Timezone/Locale should always be specified (UTC, Locale.ROOT)?

laurentgo · 2018-05-15T17:05:31Z

java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrow.java

+
+        RootAllocator rootAllocator = new RootAllocator(Integer.MAX_VALUE);
+        VectorSchemaRoot root = sqlToArrow(resultSet, rootAllocator, calendar);
+        rootAllocator.close();


if the allocator is closed, I guess it means data is invalidated? You might prefer not to provide this method...

This (rootAllocator.close()) was already removed and probably is referring to earlier code. So, we should be okay now.

laurentgo · 2018-05-15T17:09:32Z

java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java

+                    case Types.VARBINARY:
+                    case Types.LONGVARBINARY:
+                        updateVector((VarBinaryVector)root.getVector(columnName),
+//                                rs.getBytes(i), !rs.wasNull(), rowCount);


to be removed?

laurentgo · 2018-05-15T17:12:08Z

java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/AbstractJdbcToArrowTest.java

+            }
+
+        } catch (Exception e) {
+            e.printStackTrace();


It should probably left as-is if you want the test framework to fail properly?

laurentgo · 2018-05-15T17:12:52Z

java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/AbstractJdbcToArrowTest.java

+
+        } catch (Exception e) {
+            e.printStackTrace();
+        } finally {


what about using try(with-resources) pattern?

laurentgo · 2018-05-30T02:07:34Z

java/adapter/jdbc/pom.xml

+
+    </dependencies>
+
+	<build>	


can you check/fix the indentation?

laurentgo · 2018-05-30T02:31:11Z

java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java

+    }
+
+    private static void updateVector(BitVector bitVector, boolean value, boolean isNonNull, int rowCount) {
+        NullableBitHolder holder = new NullableBitHolder();


is it better to use the holder vs calling directly bitVector.setSafe(rowCount, isNonNull ? 1 : 0, value ? 1: 0) (cc @siddharthteotia )

I think we can continue to use holder so as to be consistent with other parts of the code. What do you think?

I think its fine to use any APIs exposed by vectors

laurentgo · 2018-05-30T02:36:00Z

java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java

+                if (read == -1) {
+                    break;
+                }
+                arrowBuf.setBytes(total, new ByteArrayInputStream(bytes, 0, read), read);


I think you don't need to wrap to a stream and that there's a method accepting an byte[] argument directly

laurentgo · 2018-05-30T02:50:23Z

java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/JdbcToArrowTestHelper.java

+        assertEquals(rowCount, varBinaryVector.getValueCount());
+
+        for(int j = 0; j < varBinaryVector.getValueCount(); j++){
+            assertEquals(Arrays.hashCode(values[j]), Arrays.hashCode(varBinaryVector.get(j)));


why not checking for array equality using assertArrayEquals?

laurentgo · 2018-05-30T02:55:50Z

java/adapter/jdbc/src/test/resources/h2/test1_all_datatypes_h2.yml

+    binary_field12 BINARY(100), varchar_field13 VARCHAR(256), blob_field14 BLOB, clob_field15 CLOB, char_field16 CHAR(16), bit_field17 BIT);'
+
+data:
+  - 'INSERT INTO table1 VALUES (101, 1, 45, 12000, 92233720, 17345667789.23, 56478356785.345, 56478356785.345, PARSEDATETIME(''12:45:35 GMT'', ''HH:mm:ss z''),


I noticed that all the rows are basically the same for all the tests? is there any specific reason for it? (compared to only have one row for example...)

There is no particular reason for that. It just gives you more number of rows.

…ntation

Files committed to merge changes made for review comment implementation

wesm · 2018-06-15T04:56:53Z

@atuldambalkar @YashpalThakur @yashpal is this no longer WIP? If so, can you update the PR title? I think this needs a last look from @siddharthteotia before merging

atuldambalkar · 2018-06-15T05:00:06Z

@wesm We are pretty much done with respect to code review comments changes and now waiting for last comments or PR merge from @laurentgo and @siddharthteotia

atuldambalkar · 2018-06-15T05:01:16Z

Removed the [WIP] from the title.

siddharthteotia · 2018-06-15T20:49:45Z

java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/Table.java

+        Double[] arr = new Double[values.length];
+        int i = 0;
+        for (String str: values) {
+            arr[i++] = Double.parseDouble(str);


I don't think this indentation is correct and consistent with what is followed in rest of the codebase

Can you please do checkstyle validation on all files?

siddharthteotia · 2018-06-15T20:52:30Z

LGTM. But I think we need to fix indentation that has been followed in this patch. Seems like it is different from what is there is rest of the Java code base in Arrow.

@atuldambalkar , can you please do checkstyle validation? I was under the impression that it was a part of travis-ci build

…w code.

atuldambalkar · 2018-06-16T13:27:22Z

Thanks, @siddharthteotia for your comments. I have fixed the indentation for all the JDBC adapter code files based on the checkstyle warnings. Please take a look.

siddharthteotia · 2018-06-17T20:26:58Z

@wesm , @xhochy, what is the general practice when squashing commits and merging? This PR has 148 commits and do we want all those individual commit messages to be part of commit message in master? Or should the PR owner squash and push again with a proper commit message for the overall feature?

wesm · 2018-06-21T10:32:59Z

@siddharthteotia it may have been better to use the merge tool for this patch since there were multiple authors (at least 2) -- we can do some squashing ourselves to try to preserve at least the authorship (though in this case it would be a lot of work). GitHub doesn't handle multiple authors in the merge UI as far as I can tell.

I think this is an exceptional case; if something like this happens again we should either clean up the commits before merging, or use the GitHub UI and write the author's names in the commit message

… Data Format Vector Objects (apache#1759) This patch adds JDBC adapter support for Arrow

Atul Dambalkar and others added 30 commits February 15, 2018 06:22

Initial commit

1b2d27f

Added necessary code for JDBC to Arrow schema creation and Arrow vect…

c482b50

…or objects creation.

Started adding required testcases related code.

e462f80

Added *.yml under exclude list.

89d02bf

Added *.properties under exclude list.

c9d2727

Added test case code - first cut.

b003abb

Used to YAML for the test data. Fixed issues in the vector creation for CLOB.

Merge branch 'master' of https://github.com/apache/arrow

134e339

Code changes to create poper vector objects.

333bc01

Code changes to create poper vector objects.

32fa968

Code changes to allocate memory for the vectors before adding objects.

1e5c584

Fixed code to handle dataset size.

127b4fb

Fixed code to handle only one column in select query.

Removed unused import.

8e2b7f5

Test

772ae96

Merge pull request #1 from atuldambalkar1/master

a556695

Test

Initial commit

e3c490d

Added necessary code for JDBC to Arrow schema creation and Arrow vect…

df444ad

…or objects creation.

Started adding required testcases related code.

15f281f

Added *.yml under exclude list.

a32f788

Added *.properties under exclude list.

1b4175f

Added test case code - first cut.

4c706f5

Used to YAML for the test data. Fixed issues in the vector creation for CLOB.

Code changes to create poper vector objects.

9ead27d

Code changes to create poper vector objects.

73b0198

Code changes to allocate memory for the vectors before adding objects.

c97e910

Fixed code to handle dataset size.

9459221

Fixed code to handle only one column in select query.

Removed unused import.

5c1f5f2

Test

f76ac48

Removed the size parameter from the API.

a4d2b32

Added new test file

Added doc comment.

7f70a67

Committed JdbcToArrowTest.java with a new method to test commit and p…

c2ac474

…ush in forked branch

atuldambalkar and others added 10 commits May 28, 2018 11:27

Merge branch 'master' into master

0c78755

File committed for Code Coverage related changes

67593cd

File committed for Code Coverage related changes

d260342

File committed for Code Coverage related changes

6254260

File committed for Code Coverage related changes

c9b22fe

File committed for Code Coverage related changes

fa65a31

File committed for Code Coverage related changes

9a2f463

File committed for Code Coverage related changes

7125d6e

File committed for Code Coverage related changes

b124ece

Merge pull request #8 from YashpalThakur/master

654a5e4

Pull Request created for merging Code Coverage related changes

laurentgo reviewed May 30, 2018

View reviewed changes

yashpal and others added 4 commits May 31, 2018 05:22

File committed for the changes made as part of review comment impleme…

e189b14

…ntation

File committed for the changes made as part of review comment impleme…

c66f4a2

…ntation

File committed for the changes made as part of review comment impleme…

abaea4e

…ntation

Merge pull request #9 from YashpalThakur/master

16d8ec1

Files committed to merge changes made for review comment implementation

atuldambalkar changed the title ~~ARROW-1780 - [WIP] JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects~~ ARROW-1780 - JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects Jun 15, 2018

siddharthteotia reviewed Jun 15, 2018

View reviewed changes

atuldambalkar added 2 commits June 16, 2018 05:39

Merge branch 'master' of https://github.com/apache/arrow

25eadcf

Fixed indentation for the code as per checkstyle and rest of the Arro…

dd1ffa4

…w code.

siddharthteotia approved these changes Jun 17, 2018

View reviewed changes

siddharthteotia merged commit e17f95d into apache:master Jun 19, 2018

pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025

ARROW-1780 - JDBC Adapter to convert Relational Data objects to Arrow…

2c4fffe

… Data Format Vector Objects (apache#1759) This patch adds JDBC adapter support for Arrow

Uh oh!

ARROW-1780 - JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects #1759

ARROW-1780 - JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects #1759

Uh oh!

Conversation

atuldambalkar commented Mar 15, 2018

Uh oh!

laurentgo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

atuldambalkar Jun 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wesm commented Jun 15, 2018

Uh oh!

atuldambalkar commented Jun 15, 2018

Uh oh!

atuldambalkar commented Jun 15, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

siddharthteotia commented Jun 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atuldambalkar commented Jun 16, 2018

Uh oh!

siddharthteotia commented Jun 17, 2018

Uh oh!

wesm commented Jun 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

atuldambalkar Jun 1, 2018 •

edited

Loading

siddharthteotia commented Jun 15, 2018 •

edited

Loading