This repository holds a Restful API implementation that uses data provided by a graph database (Neo4j DB) and exposes information on various queries related to computer science publication. All data used by this project were extracted from the DBLP (Online Computer Science Bibliography) website.
Publication Graph project consists of three essential parts:
- DBLP dataset Parser
- Neo4j (Graph Database) setup
- RestFul API
DBLP Parser project aims to perform the ETL process leveraging dataset of computer science publications by DBLP data source. More specifically, DBLP provides a large xml dataset containing articles, proceedings, incollection, etc.
In our use case we focused only on the following three categories of publication.
- Article: An article published in a journal.
- Inproceedings: Article that appeared in the proceedings of a conference.
- Incollection: Part of a book having its own title.
Based on the graph schema we chose (see below the Labeled Property Graph model) we collected and transformed the data of each category in a way to serve us for the final step of load phase.
The transformed data used to create the final csv files. These files contain the suitable information for the needs of publication graph project such as authors, publications, relationships between authors and publications (author-[PUBLISHED]->publication), journals, conferences and publication issued on conferences and journal (publication-[ISSUED]->(journal, conference)).
Finally, you must copy the csv files under the docker-compose-setup/neo4j/import directory to be ready for the insertion in Neo4j.
More information about DBLP Parser can be found here.
The schema of Neo4j consists of the following nodes and relationships:
-
Nodes
- Author: Properties: name
- Publication: Properties: id, title, year, pages
- Journal: Properties: name
- Conference: Properties: name
-
Relationship
- PUBLISHED: Author-[PUBLISHED]->Publication, Properties: order (values: first, middle and last)
- ISSUED: Publication -[ISSUED]-> Conference or Journal
When you execute the docker up
command that you can find in Run Project section, you can execute the script to import data into Neo4j by using the following process.
Execution of script:
- Open browser
- Go to http://localhost:7474/browser/
- Login to Neo4j DB
- Run script.
A Rest API was created on top of Neo4j exposing results on various queries related to nodes and relationships of graph schema. For the needs of the project implemented 18 cypher queries which indicate the powerful features and efficiency of a graph database like Neo4j on heavy queries.
All curl requests from the endpoints of Rest API are collected and can be found here.
The Rest API of project was implemented by the Flask framework.
To run project you have to install on your local environment the following prerequisite services.
Install docker and docker-compose services
1. https://docs.docker.com/get-docker/
2. https://docs.docker.com/compose/install/
After installation of docker and docker-compose services, you are ready to run the PublicationGraph project using the following commands
# Go to docker-compose-setup directory
$ cd docker-compose-setup/
# Build Rest API image
$ docker-compose build
# Start Neo4j and Rest API containers
$ docker-compose up
# Stop Neo4j and Rest API containers
$ docker-compose stop
# Destroy Neo4j and Rest API containers
$ docker-compose down