Certification Challenge Documentation for the Construction Spec Assistant project

📋 Project Overview

This is an agentic AI system designed to help Construction Architects review construction documents and identify inconsistencies between original specifications/drawings and Contractor Submittals.

🎯 Task 1: Defining Your Problem and Audience

Problem Statement

The problem we are trying to solve is to help Construction Architects review complex construction documents and identify inconsistencies between original specifications/drawings and contractor submittals more effectively with the help of AI.

Why This Is a Problem for Our Specific Users (1-2 paragraphs)

Construction Architects are responsible for reviewing construction documents and identifying inconsistencies between original specifications/drawings and contractor submittals. This is a time-consuming and error-prone process that can take many hours or days to complete, depending on the complexity of the project and the number of documents involved. Some Submittals and CAD drawings can contain hundreds of pages and contain complex unstructured information that is difficult to review manually. Usually the Subject Matter Experts (SMEs) (eg. Architects, Engineers, etc.) open multiple PDFs in different screens and try to manually compare them.

🛠️ Task 2: Proposing a Solution

Solution Statement

Our solution is an agentic AI system that helps Construction Architects review construction documents and identify inconsistencies between original specifications/drawings and contractor submittals more effectively with the help of AI.

The system takes as input from the user the following types of documents.

Construction Specifications: following the CSI standard format.
Submittals: non standard structure documents created by contractors.
Product Descriptions: created by product manufacturers (e.g. elevator brochures, insulation product descriptions, etc.)
Architectural CAD Drawings (not yet implemented): technical drawings created by architects using very sophisticated CAD software (e.g. Revit, AutoCAD, etc).

The system then uses sophisticated parsing mechanisms (including OCR) to extract textual information from these documents and then executes a series of steps to prepare the data for comparison.

For CSI Specifications, the system:
1. uses a hierarchical "sectionization" mechanism to extract CSI sections
2. splits the sections into smaller chunks.
3. use an LLMs to extract technical facts from them.
For Submittals and Product Descriptions, the system:

splits the text into smaller chunks.
indexes the chunks' embeddings in a vector database for hybrid (semantic and keyword) search.

Next to that, the system implements a Agentic comparison workflow that compares the extracted facts from the CSI Specifications against the relevant sections in the Submittals and Product Descriptions.

Finally, the system generates a report to the user highlighting the inconsistencies and providing evidence from the original documents to support the findings. The user can then review the report and take appropriate action.

Technology Stack & Tooling Choices

Component	Tool/Library	Rationale
Backend	FastAPI	Modern, fast web framework with automatic OpenAPI docs
	Python 3.13	Latest Python version with performance improvements
	Pydantic	Data validation and settings management
	uvicorn	ASGI server
	uv	Python dependency manager
	Docling	PDF parsing with OCR support (unstructured.io can also be an option)
	LangChain	LLM orchestration and chaining
	LangGraph	State machine for agentic workflows
	LangSmith	Observability and tracing for LLM workflows
	OpenAI	GPT-4 family models for fact extraction and comparison
	FastEmbed (`BAAI/bge-small-en-v1.5`)	Embedding model for dense retrieval
	Qdrant	Vector store for hybrid search
	MongoDB	Document storage for facts and metadata
	Pint	Unit normalization and conversion
Frontend	React + TypeScript + Vite	Modern frontend framework with fast dev server
	Tailwind CSS	Utility-first CSS framework for styling
Evaluation	RAGAS	Retrieval performance evaluation framework

Agentic Reasoning Implementation

Where we use LLMs and Agents

Fact Extraction: We use an LLM to extract technical facts from the CSI Specifications using a flexible EAV JSONL schema (Entity-Attribute-Value with evidence).
Comparison: We use a LangChain agent to orchestrate the comparison workflow. The agent uses a Retrieval-Augmented Generation (RAG) approach to compare the extracted facts from the CSI Specifications against the relevant sections in the Submittals and Product Descriptions.

📊 Task 3: Dealing with the Data

Data Sources

Specifications:
- PDF document following the CSI standard format containing technical specifications for a specific product used in a construction project.
- This document is the source of truth for the technical requirements of a specific product used in the project. From it, we extract the technical facts that we will compare against the submittals.
Submittals and Product Descriptions:
- PDFs created by contractors or manufacturers containing product and installation details about the products they are submitting for a specific project.
- These documents are the target of our comparison. The Agent will compare the technical facts extracted from the specifications against the relevant sections in the submittals.

Sample documents can be found in the data folder.

Chunking strategies

┌─────────────────────────────────────┐
│ SPECIFICATION                       │
│   → Notebook-style sectionization   │
│   → CSI-aware chunking              │
│   → Store sections + chunks         │
│   → NO vector indexing              │
├─────────────────────────────────────┤
│ SUBMITTAL / PRODUCT_DESCRIPTION     │
│   → HybridChunker (table-aware)     │
│   → Store chunks only               │
│   → Vector indexing in Qdrant       │
└─────────────────────────────────────┘

Construction Documentation (PDF)

Specification document.
- Strategy:
  - First pass: Hierarchical sectionization (PART 1/2/3 structure following the CSI standard)
  - Second pass: Chunking within sections aligned to headings/bullets
    - 700 tokens with 10-15% overlap
- Rationale: Leverage the hierarchical structure of the CSI specifications to extract technical facts. Preserves meaning across headings/paragraphs; overlap improves recall for boundary cases.

The Spec document pass through a multi-step process to extract the technical facts which is better understood using the following diagram:

Submittals and Product Descriptions:
- Strategy: Simple paragraph-based (or sentence-based when the paragraphs are too long) chunking
  - 700 tokens with 10-15% overlap
- Rationale: No hierarchical structure to preserve. Paragraphs are roughly independent. Overlap helps preserve context recall.

🎯 Benefits of the chunking strategy

Table Structure Preservation ✅

Tables serialized to Markdown format (not triplet notation)
Maintains readability and structure
Critical for technical specifications

Better Document Hierarchy ✅

Respects document structure from Docling
Extracts section titles from metadata
Better context for retrieval

More Granular Chunking ✅

Smaller, more focused chunks
Better for semantic search
Easier for LLMs to process

Intelligent Merging ✅

merge_peers=True combines related sections
Reduces fragmentation
Maintains semantic coherence

🚀 Task 4: Building an End-to-End Agentic RAG Prototype

Prototype Web Interface

File Upload Screenshot

Comparison Results

Prototype High-Level Architecture

┌─────────────────────────────────────────────────────────────┐
│                      FastAPI Application                    │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │   Document   │  │     Fact     │  │  Comparison  │       │
│  │  Processing  │  │  Extraction  │  │    Agent     │       │
│  │   Endpoints  │  │   Endpoints  │  │   Endpoints  │       │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘       │
│         │                 │                 │               │
├─────────┼─────────────────┼─────────────────┼───────────────┤
│         │                 │                 │               │
│  ┌──────▼───────┐  ┌──────▼───────┐  ┌──────▼───────┐       │
│  │   Document   │  │     Fact     │  │  Comparison  │       │
│  │  Processing  │  │  Extraction  │  │    Agent     │       │
│  │   Service    │  │   Service    │  │   Service    │       │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘       │
│         │                 │                 │               │
├─────────┼─────────────────┼─────────────────┼───────────────┤
│         │                 │                 │               │
│  ┌──────▼─────────────────▼─────────────────▼──────────┐    │
│  │              Core Domain Models                     │    │
│  │  (Section, Chunk, Fact, Entity, Attribute, Value)   │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │   Docling    │  │   LangChain  │  │    Qdrant    │       │
│  │ Integration  │  │  /LangGraph  │  │  VectorStore │       │
│  └──────────────┘  └──────────────┘  └──────────────┘       │
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │   MongoDB    │  │    OpenAI    │  │    Pint      │       │
│  │   Client     │  │     LLM      │  │ Unit Normaliz│       │
│  └──────────────┘  └──────────────┘  └──────────────┘       │
│                                                             │
└─────────────────────────────────────────────────────────────┘

RAG and Agent Architecture

RAG (Retrieval-Augmented Generation) system and LangGraph-based comparison agents for comparing specification facts against submittal documents.

┌─────────────────┐
│  Spec Fact      │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Query Builder  │
│  (Dense+Sparse) │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Ensemble       │
│  Retriever      │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  LangGraph      │
│  Comparison     │
│  Agent          │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Verdict +      │
│  Evidence       │
└─────────────────┘

Comparison Agent Workflow

API Documentation

Once the server is running, access the interactive API documentation:

Swagger UI: http://localhost:8000/api/v1/docs
ReDoc: http://localhost:8000/api/v1/redoc

Web UI

Web Interface: http://localhost:3000

🧪 Task 5: Creating a Golden Test Data Set

RAGAS Framework Evaluation Results

In order to evaluate the performance of our retrieval system, we used the RAGAS framework to compare how different retrieval strategies performs looking at 4 key metrics provided out of the box by the evaluation framework:

Context Precision: Relevance of retrieved chunks
Context Recall: Coverage of relevant information
Faithfulness: Accuracy of generated answers
Answer Relevancy: Relevance to query

I leveraged a "golden dataset" created on top of the Synthetic Data Set (Q&A) generated by RAGAS during the experimentation phase. Given the nature of the use case, I had to build a balanced Golden Dataset for Submittal/Product Description documents.

It bridges the Synthetic Data Generation (spec-only QA) to our task:

Convert SDG QA → Spec Fact (keep attribute.raw, parse op/num/unit).
Pair each fact with submittal chunk(s) labeled consistent / inconsistent / unclear (from real submittals or synthetics).
Store a gold evidence string (in the submittal) when consistent/inconsistent is determinable.

With that “gold data set” I can evaluate both halves:

Retrieval (from submittals): context precision/recall (RAGAS) vs reference submittal contexts.
Comparison: verdict accuracy and evidence quality.

Retriever Evaluation Conclusion

Prioritizing context recall, precision, and the nature of the use case using technical factual data leads us to believe that the best retriever is the Ensemble Retriever combining the Parent Document Retriever and the BM25 Retriever.

Task 6: The Benefits of Advanced Retrieval

The Construction Specification Assistant use case has a peculiarity that imposes a challenge when it comes to retrieval: we are not relying on a human to prompt the system with a query. Instead, we are using a set of technical facts extracted from the specification documents (with the help of an LLM) as the query. This means that the query is not a natural language query, but rather a set of technical facts that need to be compared against the submittal documents. From this set of extracted facts, we are leveraging the LangChain Ensemble Retriever to retrieve the most relevant chunks from the submittal documents performing a hybrid search using Sparse (BM25 keyword based) and Dense (natural language embeddings) queries.

Task 7: Assessing Performance

Here are the comprehensive results collected by the evaluation:

📊 RAGAS Metrics Comparison Across Retrievers

                                        faithfulness  answer_relevancy  
naive_retrieval_chain                         0.4222            0.3443   
bm25_retrieval_chain                          0.4075            0.3140   
contextual_compression_retrieval_chain        0.4078            0.4160   
multi_query_retrieval_chain                   0.4164            0.3808   
parent_document_retrieval_chain               0.4258            0.4026   
ensemble_retrieval_chain                      0.3954            0.3916   

                                        context_precision  context_recall  
naive_retrieval_chain                              0.4667          0.5039  
bm25_retrieval_chain                               0.4444          0.5277  
contextual_compression_retrieval_chain             0.4519          0.5065  
multi_query_retrieval_chain                        0.4444          0.4967  
parent_document_retrieval_chain                    0.4370          0.5046  
ensemble_retrieval_chain                           0.4222          0.5106

Here is a visual representation of the results:

Expected Improvements for Second Half of Course

Immediate Enhancements (Next 2-4 weeks)

Add support for local models (LLM and embeddings) using Ollama
Enhance the UX/UI
Add support for product descriptions
Add support for visual grounding allowing users to verify the provenance of the information looking at the original PDFs

Advanced Features (Next 1-2 months)

Add user authentication
Add support for WebSocket-based real-time updates
Add support for advanced filtering and search
Add support for dark mode
Add support for parsing CAD drawings and comparing them against the specifications

Name		Name	Last commit message	Last commit date
Latest commit History 162 Commits
.agent_logs		.agent_logs
.augment/rules/imported		.augment/rules/imported
.cursor/rules		.cursor/rules
backend		backend
data		data
docs		docs
frontend		frontend
mock-ui		mock-ui
notebooks		notebooks
specs		specs
tests		tests
.gitignore		.gitignore
README.md		README.md
certification_challenge_documentation.md		certification_challenge_documentation.md
docker-compose.yml		docker-compose.yml
ideation.md		ideation.md
pyproject.toml		pyproject.toml
test_env_loading.py		test_env_loading.py
test_json_parsing.py		test_json_parsing.py
test_llm_tool_calling.py		test_llm_tool_calling.py
test_tavily_integration.py		test_tavily_integration.py
test_tool_call_response.py		test_tool_call_response.py
test_web_search_fields.py		test_web_search_fields.py
uv.lock		uv.lock

rafaeltuelho/construction-spec-assistant

Folders and files

Latest commit

History

Repository files navigation