Skip to content
This repository was archived by the owner on Jul 14, 2022. It is now read-only.

Releases: neo4j-field/neo4j-arrow

v4.1 - Fix KHop bug

24 Feb 15:50
Compare
Choose a tag to compare

Bug fix for v4.

  • Fixes a bug in the SubGraphRecord implementation where field names weren't being matched properly in a switch block due to comparing against different string cases.

v4 - Bulk Database Imports

24 Jan 15:06
Compare
Choose a tag to compare

✨ New Stuff!

  • Bulk import jobs (import.bulk) that support bootstrapping a new database on a Neo4j host by streaming nodes and relationships from a neo4j-arrow client. See the example notebook for how it works.
  • New info jobs (info.server and info.jobs) for querying the server-side plugin version and currently tracked jobs. (See the ServerInfoHandler class.)
  • The Python client/wrapper (neo4j_arrow.py) now has type annotations and passes MyPy in strict mode!

⚙️ Changes in the Guts

  • Redesigned the Producer parts handling write streams (where the client pushes data to the server). Should solve some minor bugs in existing GDS Write Jobs.
  • Lots of runtime type inspection in the Python client.
  • Squashed some sequencing/race condition bugs in some of the GDS Write Jobs (previously wasn't properly advancing the status of the job).

🔨 Major Breaking Changes

  • Python wrapper code is shuffled around and now in ./python. (Will attach versions to GH releases to make it easier.)
  • Job names and parameters have been standardized using snake case for parameters (e.g. idField => id_field) and lowercase, dot notation for job names cypherRead => cypher.read.

v3.1 - Plug some Memory Leaks

12 Nov 18:44
Compare
Choose a tag to compare

Some fixes to critical reliability issues with the v3 release:

  • Setup the VectorSchemaRoot to use the memory allocator used by the flushing task
  • Close the VectorSchemaRoot before closing the allocators
  • Add in some delays in the busy loop when attempting to allocate memory (in WorkBuffer.init())...we were failing too fast.

This version should be used in lieu of v3.

v3 - 2-hops and New Plumbing

12 Nov 00:46
Compare
Choose a tag to compare
  • 🧪 Experimental k-hop (for k=2) implementation...see KHOP.md for details
  • 👨‍🔧 Major replumbing of the Producer code for reading streams, removing semaphores and lots of lock contention points. Still WIP, but showing promise at increasing performance of all read-related jobs.
  • 👟 Snuck in some special "extra" parameters that can be passed in GDS Read actions to tweak partition count, batch size, and list length parameters (for khop) on a per-job basis.

Next up: more performance tuning! 🏎️

v2 - TLS Support & GDS Write Improvements

25 Oct 20:41
Compare
Choose a tag to compare

New Features

  • TLS support (not yet supporting mutual TLS) for both client and server. A full-chain certificate and private key can be provided to the server via new ARROW_TLS_CERTIFICATE and ARROW_TLS_PRIVATE_KEY env vars. The Python neo4j_arrow.py client has been updated to allow enabling TLS and also disabling certificate validation when needed.

Improvements & Fixes

  • Easier to use Arrow memory settings, supporting suffixes (e.g. g, m, t) like when setting JVM heap size. For instance: MAX_MEM_GLOBAL=52g
  • Longer default timeouts for write jobs
  • Fixed memory leak when writing GDS graphs...now they clean up properly when using call gds.graph.drop() or when shutting down the server.
  • Support passing native PyArrow Table instances when putting a stream via the neo4j_arrow client

Known Issues & Future Work

  • No ability to write relationship properties
  • Cypher support needs some more love
  • Error handling of jobs could use improvements
  • GDS Writes of relationships end up using inefficient Java types for adjacency lists, etc.
  • GDS Write jobs could be improved by removing synchronous step of fully collecting the stream before processing it

v1 - The Line in the Sand

15 Oct 21:04
Compare
Choose a tag to compare
Pre-release

Figured need to start "tagging" something to have a referenceable build I've personally tested.

At this point, the following should be working:

  • reading nodes and their labels and properties
  • reading relationships and their types and properties
  • writing nodes with labels and properties (those supported by GDS)
  • writing relationships and types (no properties, yet!)

There are definite perf bottlenecks in some post-processing after doing writes as well as some timing issues in the write jobs.

For instance, if you want to build a graph you need to do the following:

  1. Write the nodes, supplying a new graph name (it will be created)
  2. Wait until you see on the server side (via logs) that it's complete as the client will report success after the data transfers. (I need a status indicator somewhere.)
  3. Then write the relationships.
  4. Same as with nodes, keep an eye on the server and see when it completes.
  5. The graph should be available for use now.