This repository contains the Compliance API, a scalable microservice that takes a website URL and compliance policy URL, analyzes the content of each, and returns any non-compliant findings. Powered by a microservices architecture and advanced AI, this API enables robust compliance checking for diverse applications.
- Content Compliance Evaluation: Analyzes website content against specified compliance policies.
- Scalable Microservices Architecture: Enables horizontal scaling to handle large workloads.
- Advanced Content Processing: Uses Google's Generative AI model to generate and interpret complex compliance findings..
- Modular Architecture: Follows microservice and MVC architecture for easy scalability and maintenance.
- No Database: The project does not require a database, with data retrieved directly from the product pages.
- Logging: Utilizes Winston for logging errors and info-level logs to files.
This a REST API being deployed on Render, click here
curl --location 'https://complianceai.onrender.com/api/compliance/check' \
--header 'Content-Type: application/json' \
--data '{
"url":"https://mercury.com/",
"policyUrl":"https://stripe.com/docs/treasury/marketing-treasury"
}'
Note : The free version of Render is being used and can give a 500 Internal Server Error or a 502 Bad Gateway Error also.
This a Data Flow Diagram, to access click here
https://utfs.io/f/TJAg477eCsqbx0ooqgQTeMHhEvzBm0PW4lOx7y9nrFojAqYg
- Installation
- API Endpoints
- Directory
- Services
- Environment Variables
- Technologies Used
- Logging
- Error Handling
- Future Improvements
- Node.js (version 14+)
-
Clone the repository:
git clone https://github.com/tecshouvik/ComplianceAI.git cd ComplianceAI
-
Install dependencies:
npm install
-
Create a
.env
file and add the following environment variables:PORT=3000 LLM_API_KEY = <API KEY for using Google Gemini Model>
-
Start the server:
npm run dev
The server should be running on
http://localhost:3000
.
Evaluates a webpage's content against a compliance policy and returns any non-compliant findings.
- Content-Type: application/json
- url: (String) — The URL of the webpage to check.
- policyUrl (String) — The URL of the compliance policy to compare against.
Example request:
POST /api/compliance/check
(Body)
{
"url": "https://example.com/page",
"policyUrl": "https://example.com/policy"
}
- 200 OK: Returns an object with non-compliant findings, if any.
- 400 Bad Request: Missing or invalid parameters.
- 500 Internal Server Error: An unexpected error occurred.
{
"nonCompliantFindings": [
{
"finding": "Missing disclaimer",
"explanation": "The webpage lacks a required disclaimer as specified in the compliance policy."
}
]
}
nonCompliantFindings
: All the findingsfinding
: The heading of the non-compliant finding.explanation
: description on why it is non-compliant.
complianceAI/
├── Dockerfile # Docker configuration for containerization
├── logs/ # Log files (errors, exceptions)
│ ├── combined.log
│ ├── error.log
│ └── exceptions.log
├── src/
│ ├── controllers/
│ │ └── complianceController.ts # Controller logic for compliance checking
│ ├── services/
│ │ └── webScraperService.ts # Service to scrape webpage content
│ ├── utils/
│ │ └── logger.ts # Winston-based logger for logging
│ ├── app.ts # Express application setup
│ └── routes.ts # API routes
├── package.json # Project dependencies
├── tsconfig.json # Typescript Build Configs
└── README.md # Documentation
This service handles the navigation to the given url using Puppeteer, fetches the HTML content, and cleans it by removing unnecessary elements (e.g., scripts, images, pop-ups).
etchWebpageContent(url) → html
- Parameters:
url
: The product page URL.
- Returns:
- Cleaned HTML content.
This service sends the urlContent and the policyContent to a LLM API to dynamically find the nonc-compliant features as per the policyContent.
checkContentAgainstPolicy(pageContent, policyContent) → { findings }
- Parameters:
pageContent
: Cleaned HTML content of the target page.policyContent
: Cleaned HTML content of the policy page
- Returns:
- all the non-compliant findings
Variable | Description | Default |
---|---|---|
PORT |
The port on which the API runs | 3000 |
GEMINI_API_KEY |
llmService uses Gemini 1.5 flash Model | LLM configuration |
You can create a .env
file to manage environment-specific configurations.
To get a GEMINI api Key, click here
- Node.js: Backend JavaScript runtime.
- Express: Web framework for building the API.
- Puppeteer: For headless browser automation to fetch page content.
- Winston: For logging.
- Axios: For making HTTP requests.
This API uses the Winston logging library for logging errors and info-level messages to both console and files.
- Combined log: Captures both info and error messages (
combined.log
). - Error log: Captures only error messages (
error.log
).
By default, logs are written to the root directory of the project.
The API is designed to handle common errors, including:
- Invalid URL: If the product page URL is not valid or inaccessible, an error message will be returned.
- Error fetching page content: If the browser fails to scrap the HTML content, an error message will be returned with log.
- LLM Errors: If LLM fails or any other conflict ocurs in the LLMService, an error is returned with error log.
Example error response:
{
"error": "Error in compliance check"
}
-
LLM Service Integration: Replace the mock LLM service with a live LLM API for Compliance check.
-
Error Handling: Improve error messages for better debugging and clarity.
-
Support for Additional Compliance Check Platforms: Expand compatibility to handle more finance platforms and custom websites.
-
Storing Data in a Database: Implementing a database which further stores the cssSelectors for a given website which can later used a cache before requesting the LLM
-
Containerisation: Implementing Docker and composing it to make an image which will be easy to use