A collection of Python scripts that integrate pandas functionality with JanusGraph and Gremlin servers. This repository provides tools to create a sample database of every zip code in the United States as a proof of concept.
- Connect to JanusGraph and Gremlin servers using Python
- Load and transform US city and zip code data using pandas
- Create and modify graph vertices with properties such as
is_rootandsprouted - Query and manipulate graph data programmatically
- Clear existing graph data for fresh imports
- Python 3
- JanusGraph
- Gremlin Server
- pandas (via a custom
PandasFunctionsmodule) - gremlin-python client
- Python 3.x installed
- JanusGraph and Gremlin Server running and accessible
- Required Python packages (see assumptions below)
# Clone the repo
git clone https://github.com/justin-napolitano/JanusGraphAPI.git
cd JanusGraphAPI
# (Assuming a requirements.txt exists or install manually)
pip install gremlinpython pandasScripts are standalone and can be run directly. For example, to prepare the zip code database:
python PrepareZipCodeDB.pyTo add the is_root property to city vertices:
python AddIsRootProperty.pyTo clear all vertices in the graph:
python CleareGraph.pyAddIsRootProperty.py: Adds anis_rootproperty to city vertices.AddChildrenToRoot.py: Contains functions to retrieve root vertices and their children URLs.AddSproutedProperty.py: Adds asproutedproperty to city vertices.Application.pyandTSubmit.py: Example scripts demonstrating connection and vertex addition.CleareGraph.py: Deletes all vertices from the JanusGraph database.GremlinConnect.py: Defines theGremlinConnectionclass for connecting and interacting with Gremlin server.PrepareCityDB.pyandPrepareZipCodeDB.py: Prepare city and zip code dataframes and export them.Query.py: Query utilities for retrieving vertices by label or property.- CSV files (
uscities.csv,zip_code_database_standard.csv,state_df.csv): Source data files. add children_to_root_url.py: Placeholder or incomplete script.
- The IP address
192.168.1.195and port8182are the JanusGraph server endpoints. - The
PandasFunctionsmodule is a custom wrapper around pandas for loading, transforming, and writing data. - Some scripts are incomplete or placeholders.
- Implement full data ingestion pipeline from CSVs to JanusGraph vertices and edges.
- Enhance error handling and logging across scripts.
- Add support for edge creation and complex graph queries.
- Parameterize connection details for flexibility.
- Provide Docker or environment setup scripts for easier deployment.
- Add unit and integration tests.
- Improve documentation with usage examples and API references.