Skip to content

Commit

Permalink
[NOID] Fixes #3426: add apache arrow import procedure to extended (#3978
Browse files Browse the repository at this point in the history
) (#4183)

* [NOID] Fixes #3426: add apache arrow import procedure to extended (#3978)

* [NOID] java 11 changes

* [NOID] try removing gradle deps

* [NOID] 4.4 changes

* [NOID] spotless and licence changes

* [NOID] fix tests

* [NOID] format changes
  • Loading branch information
vga91 authored Dec 3, 2024
1 parent 9b61bf6 commit 9302bf4
Show file tree
Hide file tree
Showing 8 changed files with 687 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
= apoc.import.arrow
:description: This section contains reference documentation for the apoc.import.arrow procedure.

label:procedure[] label:apoc-extended[]

[.emphasis]
apoc.import.arrow(input, $config) - Imports arrow from the provided arrow file or byte array

== Signature

[source]
----
apoc.import.arrow(urlOrBinaryFile :: ANY?, config = {} :: MAP?) :: (file :: STRING?, source :: STRING?, format :: STRING?, nodes :: INTEGER?, relationships :: INTEGER?, properties :: INTEGER?, time :: INTEGER?, rows :: INTEGER?, batchSize :: INTEGER?, batches :: INTEGER?, done :: BOOLEAN?, data :: STRING?)
----

== Input parameters
[.procedures, opts=header]
|===
| Name | Type | Default
|urlOrBinaryFile|ANY?|null
|config|MAP?|{}
|===

== Config parameters
This procedure supports the following config parameters:

.Config parameters
[opts=header, cols='1a,1a,1a,3a']
|===
| name | type |default | description
| unwindBatchSize | Integer | `2000` | the batch size of the unwind
| mapping | Map | `{}` | see `Mapping config` example below
|===

== Output parameters
[.procedures, opts=header]
|===
| Name | Type
|file|STRING?
|source|STRING?
|format|STRING?
|nodes|INTEGER?
|relationships|INTEGER?
|properties|INTEGER?
|time|INTEGER?
|rows|INTEGER?
|batchSize|INTEGER?
|batches|INTEGER?
|done|BOOLEAN?
|data|STRING?
|===

[[usage-apoc.import.arrow]]
== Usage Examples

The `apoc.import.arrow` procedure can be used to import arrow files created by the `apoc.export.arrow.*` procedures.


[source,cypher]
----
CALL apoc.import.arrow("fileCreatedViaExportProcedures.arrow")
----

.Results
[opts=header]
|===
| file | source | format | nodes | relationships | properties | time | rows | batchSize | batches | done | data
| "fileCreatedViaExportProcedures.arrow" | "file" | "arrow" | 3 | 1 | 15 | 105 | 4 | -1 | 0 | TRUE | NULL
|===


We can also import a file from a binary `byte[]` created by the `apoc.export.arrow.stream.*` procedures.

[source,cypher]
----
CALL apoc.import.arrow(`<binaryArrow>`)
----

=== Mapping config

In order to import complex types not supported by Parquet, like Point, Duration, List of Duration, etc..
we can use the mapping config to convert to the desired data type.
For example, if we have a node `(:MyLabel {durationProp: duration('P5M1.5D')}`, and we export it in a parquet file/binary,
we can import it by explicit a map with key the property key, and value the property type.

That is in this example, by using the load procedure:
[source,cypher]
----
CALL apoc.load.arrow(fileOrBinary, {mapping: {durationProp: 'Duration'}})
----

Or with the import procedure:
[source,cypher]
----
CALL apoc.import.parquet(fileOrBinary, {mapping: {durationProp: 'Duration'}})
----

The mapping value types can be one of the following:

* `Point`
* `LocalDateTime`
* `LocalTime`
* `DateTime`
* `Time`
* `Date`
* `Duration`
* `Char`
* `Byte`
* `Double`
* `Float`
* `Short`
* `Int`
* `Long`
* `Node`
* `Relationship`
* `BaseType` followed by Array, to map a list of values, where BaseType can be one of the previous type, for example `DurationArray`


Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ This file is generated by DocsTest, so don't change it!
[.procedures, opts=header, cols='5a,1a,1a']
|===
| Qualified Name | Type | Release
|xref::overview/apoc.import/apoc.import.arrow.adoc[apoc.import.arrow icon:book[]]

apoc.import.arrow(input, $config) - Imports arrow from the provided arrow file or byte array
|label:procedure[]
|label:apoc-full[]
|xref::overview/apoc.import/apoc.import.csv.adoc[apoc.import.csv icon:book[]]

apoc.import.csv(nodes, relationships, config) - imports nodes and relationships from the provided CSV files with given labels and types
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1578,6 +1578,17 @@ apoc.label.exists(element, label) - returns true or false related to label exist
|label:apoc-core[]
|===

== xref::overview/apoc.import/index.adoc[]

[.procedures, opts=header, cols='5a,1a']
|===
| Qualified Name | Type
|xref::overview/apoc.import/apoc.import.arrow.adoc[apoc.import.arrow icon:book[]]

apoc.import.arrow(input, $config) - Imports arrow from the provided arrow file or byte array
|label:procedure[]
|===

== xref::overview/apoc.load/index.adoc[]

[.procedures, opts=header, cols='5a,1a,1a']
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,7 @@ This file is generated by DocsTest, so don't change it!
*** xref::overview/apoc.hashing/apoc.hashing.fingerprintGraph.adoc[]
*** xref::overview/apoc.hashing/apoc.hashing.fingerprinting.adoc[]
** xref::overview/apoc.import/index.adoc[]
*** xref::overview/apoc.import/apoc.import.arrow.adoc[]
*** xref::overview/apoc.import/apoc.import.csv.adoc[]
*** xref::overview/apoc.import/apoc.import.graphml.adoc[]
*** xref::overview/apoc.import/apoc.import.json.adoc[]
Expand Down
5 changes: 5 additions & 0 deletions full/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,11 @@ dependencies {
exclude group: 'io.netty'
}

compileOnly group: 'org.apache.arrow', name: 'arrow-vector', version: '13.0.0'
compileOnly group: 'org.apache.arrow', name: 'arrow-memory-netty', version: '13.0.0'
testImplementation group: 'org.apache.arrow', name: 'arrow-vector', version: '13.0.0'
testImplementation group: 'org.apache.arrow', name: 'arrow-memory-netty', version: '13.0.0'

compileOnly group: 'com.couchbase.client', name: 'java-client', version: '3.3.0', withoutJacksons
testImplementation group: 'com.couchbase.client', name: 'java-client', version: '3.3.0', withoutJacksons

Expand Down
Loading

0 comments on commit 9302bf4

Please sign in to comment.