This is an agentic AI system designed to help Construction Architects review construction documents and identify inconsistencies between original specifications/drawings and Contractor Submittals.
The problem we are trying to solve is to help Construction Architects review complex construction documents and identify inconsistencies between original specifications/drawings and contractor submittals more effectively with the help of AI.
Construction Architects are responsible for reviewing construction documents and identifying inconsistencies between original specifications/drawings and contractor submittals. This is a time-consuming and error-prone process that can take many hours or days to complete, depending on the complexity of the project and the number of documents involved. Some Submittals and CAD drawings can contain hundreds of pages and contain complex unstructured information that is difficult to review manually. Usually the Subject Matter Experts (SMEs) (eg. Architects, Engineers, etc.) open multiple PDFs in different screens and try to manually compare them.
Our solution is an agentic AI system that helps Construction Architects review construction documents and identify inconsistencies between original specifications/drawings and contractor submittals more effectively with the help of AI.
The system takes as input from the user the following types of documents.
- Construction Specifications: following the CSI standard format.
- Submittals: non standard structure documents created by contractors.
- Product Descriptions: created by product manufacturers (e.g. elevator brochures, insulation product descriptions, etc.)
- Architectural CAD Drawings (not yet implemented): technical drawings created by architects using very sophisticated CAD software (e.g. Revit, AutoCAD, etc).
The system then uses sophisticated parsing mechanisms (including OCR) to extract textual information from these documents and then executes a series of steps to prepare the data for comparison.
-
For CSI Specifications, the system:
- uses a hierarchical "sectionization" mechanism to extract CSI sections
- splits the sections into smaller chunks.
- use an LLMs to extract technical facts from them.
-
For Submittals and Product Descriptions, the system:
- splits the text into smaller chunks.
- indexes the chunks' embeddings in a vector database for hybrid (semantic and keyword) search.
Next to that, the system implements a Agentic comparison workflow that compares the extracted facts from the CSI Specifications against the relevant sections in the Submittals and Product Descriptions.
Finally, the system generates a report to the user highlighting the inconsistencies and providing evidence from the original documents to support the findings. The user can then review the report and take appropriate action.
| Component | Tool/Library | Rationale |
|---|---|---|
| Backend | FastAPI | Modern, fast web framework with automatic OpenAPI docs |
| Python 3.13 | Latest Python version with performance improvements | |
| Pydantic | Data validation and settings management | |
| uvicorn | ASGI server | |
| uv | Python dependency manager | |
| Docling | PDF parsing with OCR support (unstructured.io can also be an option) | |
| LangChain | LLM orchestration and chaining | |
| LangGraph | State machine for agentic workflows | |
| LangSmith | Observability and tracing for LLM workflows | |
| OpenAI | GPT-4 family models for fact extraction and comparison | |
FastEmbed (BAAI/bge-small-en-v1.5) |
Embedding model for dense retrieval | |
| Qdrant | Vector store for hybrid search | |
| MongoDB | Document storage for facts and metadata | |
| Pint | Unit normalization and conversion | |
| Frontend | React + TypeScript + Vite | Modern frontend framework with fast dev server |
| Tailwind CSS | Utility-first CSS framework for styling | |
| Evaluation | RAGAS | Retrieval performance evaluation framework |
- Fact Extraction: We use an LLM to extract technical facts from the CSI Specifications using a flexible EAV JSONL schema (Entity-Attribute-Value with evidence).
- Comparison: We use a LangChain agent to orchestrate the comparison workflow. The agent uses a Retrieval-Augmented Generation (RAG) approach to compare the extracted facts from the CSI Specifications against the relevant sections in the Submittals and Product Descriptions.
- Specifications:
- PDF document following the CSI standard format containing technical specifications for a specific product used in a construction project.
- This document is the source of truth for the technical requirements of a specific product used in the project. From it, we extract the technical facts that we will compare against the submittals.
- Submittals and Product Descriptions:
- PDFs created by contractors or manufacturers containing product and installation details about the products they are submitting for a specific project.
- These documents are the target of our comparison. The Agent will compare the technical facts extracted from the specifications against the relevant sections in the submittals.
Sample documents can be found in the
datafolder.
┌─────────────────────────────────────┐
│ SPECIFICATION │
│ → Notebook-style sectionization │
│ → CSI-aware chunking │
│ → Store sections + chunks │
│ → NO vector indexing │
├─────────────────────────────────────┤
│ SUBMITTAL / PRODUCT_DESCRIPTION │
│ → HybridChunker (table-aware) │
│ → Store chunks only │
│ → Vector indexing in Qdrant │
└─────────────────────────────────────┘
- Specification document.
- Strategy:
- First pass: Hierarchical sectionization (PART 1/2/3 structure following the CSI standard)
- Second pass: Chunking within sections aligned to headings/bullets
- 700 tokens with 10-15% overlap
- Rationale: Leverage the hierarchical structure of the CSI specifications to extract technical facts. Preserves meaning across headings/paragraphs; overlap improves recall for boundary cases.
- Strategy:
The Spec document pass through a multi-step process to extract the technical facts which is better understood using the following diagram:
- Submittals and Product Descriptions:
- Strategy: Simple paragraph-based (or sentence-based when the paragraphs are too long) chunking
- 700 tokens with 10-15% overlap
- Rationale: No hierarchical structure to preserve. Paragraphs are roughly independent. Overlap helps preserve context recall.
- Strategy: Simple paragraph-based (or sentence-based when the paragraphs are too long) chunking
- Table Structure Preservation ✅
- Tables serialized to Markdown format (not triplet notation)
- Maintains readability and structure
- Critical for technical specifications
- Better Document Hierarchy ✅
- Respects document structure from Docling
- Extracts section titles from metadata
- Better context for retrieval
- More Granular Chunking ✅
- Smaller, more focused chunks
- Better for semantic search
- Easier for LLMs to process
- Intelligent Merging ✅
merge_peers=Truecombines related sections- Reduces fragmentation
- Maintains semantic coherence
┌─────────────────────────────────────────────────────────────┐
│ FastAPI Application │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Document │ │ Fact │ │ Comparison │ │
│ │ Processing │ │ Extraction │ │ Agent │ │
│ │ Endpoints │ │ Endpoints │ │ Endpoints │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
├─────────┼─────────────────┼─────────────────┼───────────────┤
│ │ │ │ │
│ ┌──────▼───────┐ ┌──────▼───────┐ ┌──────▼───────┐ │
│ │ Document │ │ Fact │ │ Comparison │ │
│ │ Processing │ │ Extraction │ │ Agent │ │
│ │ Service │ │ Service │ │ Service │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
├─────────┼─────────────────┼─────────────────┼───────────────┤
│ │ │ │ │
│ ┌──────▼─────────────────▼─────────────────▼──────────┐ │
│ │ Core Domain Models │ │
│ │ (Section, Chunk, Fact, Entity, Attribute, Value) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Docling │ │ LangChain │ │ Qdrant │ │
│ │ Integration │ │ /LangGraph │ │ VectorStore │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ MongoDB │ │ OpenAI │ │ Pint │ │
│ │ Client │ │ LLM │ │ Unit Normaliz│ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
RAG (Retrieval-Augmented Generation) system and LangGraph-based comparison agents for comparing specification facts against submittal documents.
┌─────────────────┐
│ Spec Fact │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Query Builder │
│ (Dense+Sparse) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Ensemble │
│ Retriever │
└────────┬────────┘
│
▼
┌─────────────────┐
│ LangGraph │
│ Comparison │
│ Agent │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Verdict + │
│ Evidence │
└─────────────────┘
Once the server is running, access the interactive API documentation:
- Swagger UI: http://localhost:8000/api/v1/docs
- ReDoc: http://localhost:8000/api/v1/redoc
- Web Interface: http://localhost:3000
In order to evaluate the performance of our retrieval system, we used the RAGAS framework to compare how different retrieval strategies performs looking at 4 key metrics provided out of the box by the evaluation framework:
- Context Precision: Relevance of retrieved chunks
- Context Recall: Coverage of relevant information
- Faithfulness: Accuracy of generated answers
- Answer Relevancy: Relevance to query
I leveraged a "golden dataset" created on top of the Synthetic Data Set (Q&A) generated by RAGAS during the experimentation phase. Given the nature of the use case, I had to build a balanced Golden Dataset for Submittal/Product Description documents.
It bridges the Synthetic Data Generation (spec-only QA) to our task:
- Convert SDG QA → Spec Fact (keep attribute.raw, parse op/num/unit).
- Pair each fact with submittal chunk(s) labeled consistent / inconsistent / unclear (from real submittals or synthetics).
- Store a gold evidence string (in the submittal) when consistent/inconsistent is determinable.
With that “gold data set” I can evaluate both halves:
- Retrieval (from submittals): context precision/recall (RAGAS) vs reference submittal contexts.
- Comparison: verdict accuracy and evidence quality.
Prioritizing context recall, precision, and the nature of the use case using technical factual data leads us to believe that the best retriever is the Ensemble Retriever combining the Parent Document Retriever and the BM25 Retriever.
The Construction Specification Assistant use case has a peculiarity that imposes a challenge when it comes to retrieval: we are not relying on a human to prompt the system with a query. Instead, we are using a set of technical facts extracted from the specification documents (with the help of an LLM) as the query. This means that the query is not a natural language query, but rather a set of technical facts that need to be compared against the submittal documents. From this set of extracted facts, we are leveraging the LangChain Ensemble Retriever to retrieve the most relevant chunks from the submittal documents performing a hybrid search using Sparse (BM25 keyword based) and Dense (natural language embeddings) queries.
Here are the comprehensive results collected by the evaluation:
📊 RAGAS Metrics Comparison Across Retrievers
faithfulness answer_relevancy
naive_retrieval_chain 0.4222 0.3443
bm25_retrieval_chain 0.4075 0.3140
contextual_compression_retrieval_chain 0.4078 0.4160
multi_query_retrieval_chain 0.4164 0.3808
parent_document_retrieval_chain 0.4258 0.4026
ensemble_retrieval_chain 0.3954 0.3916
context_precision context_recall
naive_retrieval_chain 0.4667 0.5039
bm25_retrieval_chain 0.4444 0.5277
contextual_compression_retrieval_chain 0.4519 0.5065
multi_query_retrieval_chain 0.4444 0.4967
parent_document_retrieval_chain 0.4370 0.5046
ensemble_retrieval_chain 0.4222 0.5106
Here is a visual representation of the results:
- Add support for local models (LLM and embeddings) using Ollama
- Enhance the UX/UI
- Add support for product descriptions
- Add support for visual grounding allowing users to verify the provenance of the information looking at the original PDFs
- Add user authentication
- Add support for WebSocket-based real-time updates
- Add support for advanced filtering and search
- Add support for dark mode
- Add support for parsing CAD drawings and comparing them against the specifications




