Skip to content

A collection of python scripts that integrate pandas functionality with janusgraph and gremlin servers. The files within this repository should enable a user to create a sample db of every zip code in the united states as a proof of concept.

License

Notifications You must be signed in to change notification settings

justin-napolitano/JanusGraphAPI

Repository files navigation

JanusGraphAPI

A collection of Python scripts that integrate pandas functionality with JanusGraph and Gremlin servers. This repository provides tools to create a sample database of every zip code in the United States as a proof of concept.


Features

  • Connect to JanusGraph and Gremlin servers using Python
  • Load and transform US city and zip code data using pandas
  • Create and modify graph vertices with properties such as is_root and sprouted
  • Query and manipulate graph data programmatically
  • Clear existing graph data for fresh imports

Tech Stack

  • Python 3
  • JanusGraph
  • Gremlin Server
  • pandas (via a custom PandasFunctions module)
  • gremlin-python client

Getting Started

Prerequisites

  • Python 3.x installed
  • JanusGraph and Gremlin Server running and accessible
  • Required Python packages (see assumptions below)

Installation

# Clone the repo
git clone https://github.com/justin-napolitano/JanusGraphAPI.git
cd JanusGraphAPI

# (Assuming a requirements.txt exists or install manually)
pip install gremlinpython pandas

Running

Scripts are standalone and can be run directly. For example, to prepare the zip code database:

python PrepareZipCodeDB.py

To add the is_root property to city vertices:

python AddIsRootProperty.py

To clear all vertices in the graph:

python CleareGraph.py

Project Structure

  • AddIsRootProperty.py: Adds an is_root property to city vertices.
  • AddChildrenToRoot.py: Contains functions to retrieve root vertices and their children URLs.
  • AddSproutedProperty.py: Adds a sprouted property to city vertices.
  • Application.py and TSubmit.py: Example scripts demonstrating connection and vertex addition.
  • CleareGraph.py: Deletes all vertices from the JanusGraph database.
  • GremlinConnect.py: Defines the GremlinConnection class for connecting and interacting with Gremlin server.
  • PrepareCityDB.py and PrepareZipCodeDB.py: Prepare city and zip code dataframes and export them.
  • Query.py: Query utilities for retrieving vertices by label or property.
  • CSV files (uscities.csv, zip_code_database_standard.csv, state_df.csv): Source data files.
  • add children_to_root_url.py: Placeholder or incomplete script.

Assumptions

  • The IP address 192.168.1.195 and port 8182 are the JanusGraph server endpoints.
  • The PandasFunctions module is a custom wrapper around pandas for loading, transforming, and writing data.
  • Some scripts are incomplete or placeholders.

Future Work / Roadmap

  • Implement full data ingestion pipeline from CSVs to JanusGraph vertices and edges.
  • Enhance error handling and logging across scripts.
  • Add support for edge creation and complex graph queries.
  • Parameterize connection details for flexibility.
  • Provide Docker or environment setup scripts for easier deployment.
  • Add unit and integration tests.
  • Improve documentation with usage examples and API references.

About

A collection of python scripts that integrate pandas functionality with janusgraph and gremlin servers. The files within this repository should enable a user to create a sample db of every zip code in the united states as a proof of concept.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages