An automated Governance, Risk, and Complance (GRC) and Open Source Intelligence (OSINT) platform designed to streamline corporate screening, anti-money laundering (AML), and know-your-customer (KYC) workflows. The application aggregates data from official state registries, open-source news feeds, and utilizes generative AI (Google Gemini 2.5) to deliver real-time risk assessments.
- Multi-Row Registry Search: – Queries government databases via DaData API by Company Name or Tax ID (INN), handling ambiguous queries and displaying comprehensive legal entitles data (names, status, registration addresses).
- OSINT & Reputational Scraper: – Automatically scans media sources for adverse keywords and negative news regarding the target entity.
- Risk Scoring Engine: – Dynamically calculates a weighted risk score (0-100) based on sanctions listings, CBR blacklists, domain age verification, corporate status, and adverse media presence.
- AI-Driven Compliance Verdict: – Integrates the state-of-the-art Gemini 2.5 Flash model via the official 'google-genai' client to analyze negative context, offering structured summaries on regulatory, operational, and financial crime risks.
- Internationalization (i18n): – Features a full-fledged language switcher (RU/EN) within the interface, adjusting both UI text and LLM instructions dynamically.
- Local Auditing & Archive: – Automatically logs every screening session into an embedded SQLite database using standart SQL queries for historical compliance audits.
- Language: Python 3.14+
- Frontend/UI: Streamlit Framework
- AI Orchestration: Google GenAI SDK (Gemini 2.5 Flash)
- Data Engineering: Pandas, SQLite3
- Integrations: DaData API (Rest Client)
- Environment: Fully containerized with VS Code DevConteiners, dependency management handled via 'uv' package installer.
- Input: User submits an entity name or INN through the Streamlit UI.
- Data Aggregation: Parallel data fetching from DaDataAPI + Local Sanctions/CBR Checkers.
- OSINT Analysis: Extraction of adverse news context.
- LLM Evaluation: Gemini 2.5 Flash processes the text array and outputs an expert risk verdict.
- Persistence: Data is stored in SQLite; results are rendered dynamically on the web UI.
- Python 3.9 - 3.14
- Docker Desktop (Optional, for DevContainers)
-
Clone the repository:
git clone (https://github.com/abrenmarie/regtech-osint-finding.git) cd regtech-osint-finding -
Set up a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate # MacOS/Linux venv/Scripts/activate # Windows
-
Install dependencies:
pip install -r requirements.txt
-
Configure environment variables:
Create a .env file in the root directory and populate your keys
DADATA_API_KEY=your_dadata_key
DADATA_SECRET_KEY=your_dadata_secret
OPENSANCTIONS_API_KEY=your_opensanctions_key
GEMINI_API_KEY=your_gemini_api_key-
Run the application:
streamlit run app.py
Deployment in Streamlit Cloud:
To maintain functionally in production, specify the environment variables in the Streamlit Cloud dashboard under Advanced Settings -> Secrets:
DADATA_API_KEY=your_key
DADATA_SECRET_KEY=your_secret
GEMINI_API_KEY=your_gemini_keyLicense: This project is developed for educational and portfolio presentation purposes in the field of International Economics, Corporate Law, and RegTech Automation.