MEDGraphy is an intelligent, interactive drug information application that leverages a Neo4j graph database and a Retrieval-Augmented Generation (RAG) pipeline with the Groq Llama3-8B model. It provides users with a powerful tool to query medical information, check for drug interactions, and visualize complex relationships within the data.
- Full RAG Pipeline: Ask general questions about medicines and get context-aware answers generated by an LLM.
- Direct Medicine Lookup: Quickly find the uses and side effects of a specific drug.
- Condition-Based Search: Discover which medicines are used to treat a particular condition.
- Interaction Checker: Identify potential drug interactions by finding medicines with the same active ingredients.
- Vector Similarity Search: Find semantically similar medicines based on their descriptions or uses.
- Interactive Graph Visualization: Explore the relationships between medicines, conditions, side effects, and active ingredients in an interactive graph.
- Frontend: Streamlit
- Backend: Python
- Database: Neo4j
- LLM: Groq (Llama3-8B)
- Embeddings: Sentence-Transformers
- Python 3.8+
- A running Neo4j Aura instance (or local installation)
- A Groq API Key
git clone <your-repo-link>
cd MEDGraphypython -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`pip install -r requirements.txtCreate a .env file in the root directory and add your credentials. For Streamlit Cloud deployment, you will use st.secrets.
# Neo4j Credentials
NEO4J_URI="neo4j+s://your-aura-instance.databases.neo4j.io"
NEO4J_USER="neo4j"
NEO4J_PASSWORD="your-password"
# Groq API Key
GROQ_API_KEY="your-groq-api-key"The data for this project is sourced from a CSV file and needs to be loaded into your Neo4j database. This process includes creating nodes, relationships, and vector embeddings.
A Google Colab notebook has been prepared to handle this entire data loading and graph creation process.
➡️ Open the Data Loading Notebook in Google Colab
Follow the instructions in the notebook to connect to your Neo4j instance and populate it with the required data.
Once your database is populated, you can run the application locally:
streamlit run streamlit_app.pyThe application will open in your web browser, ready for you to explore!
You can now ingest the data/Medicine_Details.csv directly into Neo4j with embeddings and relationship extraction.
python ingest_graph.py --csv data/Medicine_Details.csv --clear
Flags:
--limit N: ingest only first N rows (debug)--clear: wipe existing graph before loading
Nodes:
Medicine {name, composition, uses_text, side_effects_text, image_url, excellent_review_pct, average_review_pct, poor_review_pct, embedding}ActiveIngredient {name}SideEffect {name}Condition {name}Manufacturer {name}
Relationships:
(Medicine)-[:CONTAINS_INGREDIENT]->(ActiveIngredient)(Medicine)-[:HAS_SIDE_EFFECT]->(SideEffect)(Medicine)-[:TREATS]->(Condition)(Medicine)-[:MANUFACTURED_BY]->(Manufacturer)(Medicine)-[:INTERACTS_WITH {basis:'shared_ingredient', ingredient}]->(Medicine)(symmetric, created via shared ingredients)
Vector Index:
medicine_embeddingsonMedicine.embedding(dim 384, cosine)
Added helper methods in graph_rag_query.py:
get_medicine_with_image(name)– fetch rich card info including image.symptom_to_medicines(symptoms)– reverse map symptom keywords to candidate medicines (based on side effects).justify_prescription(medicines)– returns structured bundle for LLM justification.interaction_conflicts(medicine)– fetch pre-computedINTERACTS_WITHpeers.
- Add explicit
:CONTRAINDICATED_WITHfrom curated rules (composition clashes, duplicate therapeutic class). - Integrate external pharmacology ontology (RxNorm / ATC) for class-level reasoning.
- Add graph-based similarity (graph embeddings) using Neo4j Graph Data Science.
- Cache RAG contexts per query to reduce latency.