AutoC is an automated tool designed to extract and analyze Indicators of Compromise (IoCs) from open-source threat intelligence sources.
- Threat Intelligence Parsing: Parses blogs, reports, and feeds from various OSINT sources.
- IoC Extraction: Automatically extracts IoCs such as IP addresses, domains, file hashes, and more.
- Visualization: Display extracted IoCs and analysis in a user-friendly interface.
Fastest way to get started with AutoC is to run it using Docker (with docker-compose).
Make sure to set up the .env file with your API keys before running the app (See Configuration section below for more details).
git clone https://github.com/barvhaim/AutoC.git
cd AutoC
docker-compose up --buildOnce the app is up and running, you can access it at http://localhost:8000
- With crawl4ai:
docker-compose --profile crawl4ai up --build - With Milvus vector database:
docker-compose --profile milvus up --build - With both:
docker-compose --profile crawl4ai --profile milvus up --build
- Install Python 3.11 or later. (https://www.python.org/downloads/)
- Install
uvpackage manager (https://docs.astral.sh/uv/getting-started/installation/)- For Linux and MacOS, you can use the following command:
curl -LsSf https://astral.sh/uv/install.sh | sh - For Windows, you can use the following command:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
- For Linux and MacOS, you can use the following command:
- Clone the project repository and navigate to the project directory.
git clone https://github.com/barvhaim/AutoC.git cd AutoC - Install the required Python packages using
uv.uv sync
- Configure the
.envfile with your API keys (See Configuration section below for more details).
Set up API keys by adding them to the .env file (Use .env.sample file as a template).
You can use either of multiple LLM providers (IBM WatsonX, OpenAI), you will configure which one to use in the next step.
cp .env.sample .env- watsonx.ai by IBM ("watsonx") Get API Key
- OpenAI ("openai") - Experimental
- RITS internal IBM ("rits")
- Ollama ("ollama") - Experimental
| Provider (LLM_PROVIDER) | Models (LLM_MODEL) |
|---|---|
| watsonx.ai by IBM (watsonx) | - meta-llama/llama-3-3-70b-instruct - ibm-granite/granite-3.1-8b-instruct |
| RITS (rits) | - meta-llama/llama-3-3-70b-instruct - ibm-granite/granite-3.1-8b-instruct - deepseek-ai/DeepSeek-V3 |
| OpenAI (openai) | - gpt-4.1-nano |
| Ollama (ollama) Experimental | - granite3.2:8b |
By default, AutoC uses combination of docling and beautifulsoup4 libraries to extract blog posts content, which behind the scenes uses requests library to fetch the blog post content.
There is an option to use Crawl4AI that uses a headless browser to fetch the blog post content, which is more reliable, but requires additional setup.
To enable Crawl4AI, you need Crawl4AI backend server, which can be run using Docker:
docker-compose --profile crawl4ai up -dThe crawl4ai service uses a profile configuration, so it only starts when explicitly requested with the --profile crawl4ai flag.
And then set the environment variables in the .env file to point to the Crawl4AI server:
USE_CRAWL4AI_HEADLESS_BROWSER_HTML_PARSER=true
CRAWL4AI_BASE_URL=http://localhost:11235AutoC processes analyst questions about articles in two modes:
- Individual mode (default): Each question is processed separately with individual LLM calls
- Batch mode: All questions are processed together in a single LLM call for improved performance
To enable batch mode, set the environment variable in the .env file:
QNA_BATCH_MODE=trueYou can also control this via API settings by including "qna_batch_mode": true in your request.
Benefits of batch mode:
- Reduces number of API calls from N questions to 1 call
- Potentially faster processing for multiple questions
- More cost-effective for large question sets
- Automatic fallback to individual mode if batch processing fails
AutoC supports Retrieval-Augmented Generation (RAG) for intelligent context retrieval during Q&A processing:
- Standard mode (default): Uses the entire article content as context for answering questions
- RAG mode: Intelligently retrieves only the most relevant chunks of content for each question
To enable RAG mode, set the environment variable in the .env file:
QNA_RAG_MODE=trueYou can also control this via API settings by including "qna_rag_mode": true in your request.
Benefits of RAG mode:
- More targeted and relevant answers by focusing on specific content sections
- Improved answer quality for long articles by reducing noise
- Better handling of multi-topic articles
- Automatic content chunking and semantic search
- Efficient processing of large documents
Note: RAG mode only works with individual Q&A processing mode. When batch mode (QNA_BATCH_MODE=true) is enabled, RAG mode is automatically disabled and the full article content is used as context.
RAG Configuration:
RAG mode requires a Milvus vector database. Configure the connection in your .env file:
RAG_MILVUS_HOST=localhost
RAG_MILVUS_PORT=19530
RAG_MILVUS_USER=
RAG_MILVUS_PASSWORD=
RAG_MILVUS_SECURE=falseTo run Milvus with Docker:
docker-compose --profile milvus up -dHow it works:
- Article content is automatically chunked and indexed into Milvus vector store
- For each analyst question, the most relevant content chunks are retrieved
- Only the relevant context is sent to the LLM for answer generation
- Vector store is automatically cleaned up after processing
AutoC can detect MITRE ATT&CK TTPs in the blog post content, which can be used to identify the techniques and tactics used by the threat actors.
To enable MITRE ATT&CK TTPs detection, you need to set the environment variable in the .env file:
HF_TOKEN=<your_huggingface_token>
DETECT_MITRE_TTPS_MODEL_PATH=dvir056/mitre-ttp # Hugging Face model path for MITRE ATT&CK TTPs detectionInformation about model training: https://github.com/barvhaim/attack-ttps-detection?tab=readme-ov-file#-mitre-attck-ttps-classification
Run the AutoC tool with the following command:
uv run python cli.py extract --help (to see the available options)
uv run python cli.py extract --url <blog_post_url>
Assuming the app .env file is configured correctly, you can run the app using one of the following options:
For running the app locally, you'll need node 20 and npm installed on your machine. We recommend using nvm for managing node versions.
cd frontend
nvm use
npm install
npm run buildOnce the build is complete, you can run the app using the following command from the root directory:
cd ..
uv run python -m uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4One the app is up and running, you can access it at http://localhost:8000
For development purposes, you can run the app in development mode using the following command:
Start the backend server:
uv run python -m uvicorn main:app --reloadand in a separate terminal, start the frontend development server:
cd frontend
nvm use
npm install
npm run build
npm run devOnce the app is up and running, you can access it at http://localhost:5173
Make sure you have Claude Desktop installed, uv package manager and Python installed on your machine.
Clone the project repository and navigate to the project directory.
Install the required Python packages using uv.
uv syncEdit claude desktop config file and add the following lines to the mcpServers section:
{
"mcpServers": {
"AutoC": {
"command": "uv",
"args": [
"--directory",
"/PATH/TO/AutoC",
"run",
"mcp_server.py"
]
}
}
}Restart the app, you should see the AutoC MCP server in the list of available MCP servers.