BIM graph agent with data processing system that converts IFC files to Neo4j graph database and provides AI-powered natural language querying capabilities. This project is example and demonstration to show how to use graph database like neo4j as the viewpoint of RAG and AI Agent development.
- IFC to Graph Conversion: Automatically processes IFC files and converts them to Neo4j graph database
- Data Integrity: Accurately converts IFC elements and relationships without data loss
- File Metadata Management: Stores and links IFC file information with graph structure
- Natural Language Querying: Ask questions about BIM data in plain English or Korean
- Smart Property Analysis: Analyzes nested JSON properties regardless of modeling tool
- Interactive Console Interface: Real-time conversational interface for BIM data exploration
- Python 3.9 or higher
- Neo4j database (version 4.0 or higher)
- 8GB RAM minimum (16GB recommended for large IFC files)
- 2GB free disk space for database storage
- Ollama server (local LLM runtime)
- qwen2.5-coder:7b model (for Cypher query generation)
- 8GB VRAM recommended (NVIDIA GPU optional for faster processing)
- Internet connection for initial model download
- Create elements database in Neo4j like below
-
Clone or download this project
-
Install required packages:
pip install -r requirements.txt
Required packages include:
ifcopenshell- IFC file parsingneo4j- Neo4j database driverlangchainecosystem - AI frameworkollama- Local LLM integrationstreamlit- Web interface framework
-
Configure environment variables in
.envfile:NEO4J_URI=bolt://localhost:7687 NEO4J_USER=neo4j NEO4J_PASSWORD=your_password NEO4J_DATABASE=elements -
Install and setup Ollama for BIM Graph Agent:
# Install Ollama from https://ollama.ai # Pull required model (optimized single model approach) ollama pull qwen2.5-coder:7b # Start Ollama server ollama serve
python import_ifc.py [options]
Options:
--input-dir DIR Input directory containing IFC files (default: ./input)
--clear-db Initialize database before conversion
--log-level LEVEL Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
--no-log-file Disable file logging
--validate Run validation after conversion
--stats Output statistics after conversion completion
--help Show help messageConvert all IFC files in the input directory:
python import_ifc.py --input-dir ./input --statsClear database and convert with debug logging:
python import_ifc.py --clear-db --log-level DEBUG --validateLaunch the AI-powered natural language console interface:
python BIM_graph_agent.pyLaunch the Streamlit web application for a user-friendly interface:
streamlit run BIM_graph_agent_web.pyThe web app will automatically open in your browser at http://localhost:8501
Web App Features:
- Dark mode chatbot interface
- Real-time query processing
- Markdown rendering for formatted responses
Basic Element Queries:
- How many walls are in the building?
- Show me all doors in the project
- List all windows with their properties
Property-Specific Queries:
- What is the area of room A204?
File and Metadata Queries:
- What IFC files are loaded in the database?
- Show me the file information for this model
Advanced Analysis:
- Find all load-bearing walls
TBD (Relationship Queries etc)
To successfully query the questions below, the RAG pipeline must be fine-tuned, including prompt templates, database schema context, LLM refinements, function calls, etc.
- Find all spaces on the Level 2
- Show me walls with thickness greater than 200mm
- What elements are connected to this wall?
- Find all elements that belong to the ground floor
- Calculate total floor area by level
- List all mechanical equipment in the building
BIM_graph_rag/
├── input/ # Input directory for IFC files
│ └── Duplex_A_20110907.ifc # Sample IFC file
├── src/ # Source code modules
│ ├── ifc_parser.py # IFC file parsing module
│ ├── neo4j_database.py # Neo4j database connection module
│ ├── graph_converter.py # IFC to graph conversion logic
│ └── utils.py # Logging utilities
├── logs/ # Log files directory
├── requirements.txt # Required Python packages
├── .env # Environment configuration
├── import_ifc.py # Main CLI application
├── BIM_graph_expert.py # BIM Graph Agent (AI expert system)
└── README.md # This file
- IFCFile: Contains file metadata (filename, path, creation date, etc.)
- Element: Represents IFC elements with labels like
:Element:IfcWall,:Element:IfcDoor
globalId: IFC GlobalId (unique identifier)name: Element nameifcClass: IFC class namedescription: Element descriptionsourceFileId: Reference to source IFC fileproperties: PropertySet information (stored as JSON)
BELONGS_TO_FILE: Links elements to their source IFC fileAGGREGATES: Aggregation relationships between elementsCONNECTS_TO: Connection relationships between elementsCONTAINED_IN: Spatial containment relationshipsASSIGNED_TO: Group assignment relationships
The application provides:
- Real-time progress logging
- Conversion statistics (node/relationship counts)
- Element type distribution
- Comprehensive error handling and reporting
The BIM Graph Agent is AI system that enables natural language querying of BIM graph data with intelligent property analysis.
- Natural Language Processing: User enters questions in English or Korean
- Smart Query Generation: qwen2.5-coder:7b model creates optimized Cypher queries
- Graph Data Retrieval: Executes queries against Neo4j elements database
- Intelligent Property Analysis: Analyzes nested JSON properties from any modeling tool
- Smart Schema Detection: Automatically adapts to different IFC modeling approaches
- Interactive Learning: Shows generated Cypher queries for educational purposes
- Flexible Property Matching: Finds area, volume, and other properties regardless of naming conventions
- Cross-Tool Compatibility: Works with models from different BIM software vendors
- Intelligent Fallback: Returns full property JSON when specific paths are unavailable
- Ask questions in natural language (no technical Cypher knowledge required)
- Be specific about elements (walls, doors, spaces, etc.) for better results
- Property queries automatically search through all available property sets
- The system shows the generated Cypher query and data sources for transparency
Strengths:
- Modular architecture with excellent scalability
- Universal BIM software compatibility (Revit, ArchiCAD, Tekla)
- Natural language interface without technical knowledge requirement
- Educational value through Cypher query visualization
Performance Bottlenecks situation:
- Response Time: 10+ seconds (RTX 8GB VRAM limitation)
- Model Size: qwen2.5-coder:7b (4.7GB) loading overhead
- Memory Constraints: Single GPU VRAM limitations
- Concurrent Users: Limited to single-user sessions
Lightweight Models:
# Replace current model with smaller alternatives
ollama pull qwen2.5-coder:3b # 50% size reduction
ollama pull deepseek-coder:6.7b # Coding-optimizedQuantized Models:
# 4-bit quantization for 75% memory reduction
ollama pull qwen2.5-coder:7b-q4_0 # ~2.5GBvLLM (Maximum Performance):
- 2-5x inference speed improvement
- Advanced memory management and optimization
- Multi-GPU tensor parallelism support
- Production-grade deployment features
pip install vllm>=0.2.0llama-cpp-python (Recommended for Simplicity):
- Direct model loading without Ollama overhead
- Automatic CPU/GPU detection and optimization
- GGUF quantized model support (4-8bit)
- 50-70% performance improvement with minimal setup
pip install llama-cpp-pythonOptimum (Hugging Face Integration):
- One-click optimization with minimal code changes
- ONNX Runtime automatic graph optimization
- Cross-platform hardware acceleration
pip install optimum[onnxruntime-gpu]Multi-GPU Configuration:
- Multi GPU setup for parallel processing
- vLLM tensor parallelism support
- 80-90% performance improvement potential
- Switch to llama-cpp-python - Direct model loading, 50% speed boost
- Use qwen2.5-coder:3b or quantized models - Immediate memory and speed improvement
- Optimize system settings - Neo4j indexing and Ollama configuration
Cloud API Integration:
- OpenAI GPT-4 or Anthropic Claude API
- Sub-second response times
- Monthly cost: $50-200 depending on usage
- Performance improvement
Dedicated Inference Server:
- vLLM or llama-cpp-python server deployment
- Load balancing for multiple users
- Professional-grade performance optimization
This project uses Neo4j as a demonstration of graph database capabilities for BIM data. However, the choice between graph and traditional databases depends on specific use cases and requirements.
Optimal Use Cases:
- Complex Multi-hop Queries: 10+ level deep relationship traversals
- Real-time Path Finding: Navigation, routing, network analysis
- Operations Research Problems: Supply chain optimization, logistics
- Pattern Detection: Fraud detection, recommendation engines
- Network Analysis: Social networks, infrastructure networks
Example Scenarios:
-- Complex relationship traversal (Neo4j strength)
MATCH path = shortestPath((start)-[*10..50]-(end))
WHERE all(r in relationships(path) WHERE r.weight < 100)
RETURN path, length(path) as hop_countAdvantages:
- Exceptional performance for deep graph traversals
- Intuitive modeling of complex relationships
- Built-in graph algorithms and analytics
- Visual query representation with Cypher
Disadvantages:
- Steep learning curve (Cypher vs SQL)
- Limited aggregation and reporting capabilities
- Higher operational complexity
- Enterprise licensing costs ($15K-500K+ annually)
- Single-node limitations in Community Edition
MySQL/PostgreSQL Optimal Use Cases:
- Simple to Medium Queries: 1-3 level joins (90% of BIM queries)
- Property-based Searches: Area, material, quantity lookups
- Aggregation and Reporting: Statistics, summaries, dashboards
- High-frequency CRUD Operations: Standard web applications
Example Schema:
-- Relational approach for BIM data
CREATE TABLE ifc_elements (
id VARCHAR(36) PRIMARY KEY,
ifc_class VARCHAR(50),
name VARCHAR(255),
properties JSON,
INDEX idx_class (ifc_class),
INDEX idx_area ((CAST(properties->'$.Area' AS DECIMAL)))
);Advantages:
- Mature ecosystem and tooling
- Universal SQL knowledge
- Excellent performance for simple queries
- Cost-effective (free/low-cost licensing)
- Rich aggregation and analytical capabilities
MongoDB Optimal Use Cases:
- Document-centric BIM Data: Natural fit for IFC property sets
- Flexible Schema Evolution: Varying property structures
- Horizontal Scaling: Large datasets across multiple servers
- JSON-native Operations: Direct property manipulation
- Complex relationship analysis is core business requirement
- 10+ hop graph traversals are frequent
- Real-time pathfinding is essential
- Budget allows for enterprise licensing ($50K+)
- Team has graph database expertise
- Property-based queries dominate (90% of BIM use cases)
- Cost optimization is priority
- Standard SQL expertise is available
- Reporting and analytics are important
- Proven stability is required
- Document structure varies significantly
- Horizontal scaling is needed
- JSON manipulation is frequent
- Schema flexibility is important
# Smart routing based on query complexity
class HybridBIMAgent:
def process_query(self, query):
if self.requires_deep_traversal(query):
return self.neo4j_engine.process(query) # Complex relationships
else:
return self.mysql_engine.process(query) # Standard queriesFrom Neo4j to SQL:
- 5-10x faster development for standard queries
- 80% cost reduction in licensing and operations
- Easier team onboarding and maintenance
- Better integration with existing tools
From SQL to Neo4j:
- Significant performance gain for relationship queries
- Better modeling of complex interconnections
- Enhanced analytical capabilities
- Higher operational overhead
-
Neo4j Connection Failed
- Ensure Neo4j is running
- Check connection settings in
.envfile - Verify database credentials
-
IFC File Parsing Failed
- Check if IFC file is valid (IFC2x3 or IFC4 schema)
- Ensure file is not corrupted
- Check file permissions
-
Memory Issues with Large Files
- Increase available system memory
- Process files individually if needed
-
Slow AI Response Times
- Consider switching to lighter models (qwen2.5-coder:3b)
- Optimize Ollama configuration settings
- Consider vLLM for production deployments
Log files are automatically created in the logs/ directory with timestamps for troubleshooting.
This project is developed for BIM data processing and graph analysis purposes.
Taewook Kang (laputa99999@gmail.com)










