The goal of this project is to derive initial architecture proposals by automatically identifying cohesive groups of functional requirements. Each group (or cluster) represents a potential software component or bounded context, enabling a behavior-driven approach to architectural insight and system decomposition.
The pipeline automates:
- Embedding requirements into semantic vectors
- Clustering them to reveal cohesive functional groups
- Storing results in Qdrant for fast semantic search
- Visualizing clusters in 2D for exploration
- Serving results via a lightweight HTTP server
- Flexible input: Load functional requirements from
.txtor.jsonfiles - Modern embeddings: Uses
all-MiniLM-L6-v2(384-dim sentence transformers) for semantic representation - Controllable clustering: Agglomerative Clustering with cosine distance; cluster granularity tuned via
--cluster-distance - Semantic search: Store vectors and metadata in Qdrant for fast querying and filtering
- Interactive visualization: 2D scatter plots with hover details using Plotly
- HTTP API:
/clusters.html– Interactive cluster map/clusters.json– Machine-readable cluster assignments/embeddings?limit=N&vector_len=M– Inspect raw vectors- N = number of embedding vectors to return
- M = length (dimensionality) of each embedding vector
- Docker-ready: Runs in containers alongside Qdrant for easy setup
Create functional_requirements.txt in app directory (one requirement per line):
The system must authenticate a customer using a customer number and a password.
The system must reject any order-related action for a customer who is blacklisted.
...Or use JSON format (requirements.json):
[
"The system must authenticate...",
"The system must reject..."
]Build and run:
docker-compose up --build# Cluster visualization
http://localhost:8000/clusters.html
# Raw cluster data
http://localhost:8000/clusters.json
# Embedding samples
http://localhost:8000/embeddings?limit=5- Trust
clusters.jsonover the 2D plot: Clustering happens in 384D; the plot is a visualization aid - Small, tight clusters indicate specialized subdomains (e.g., SMS parsing, blueprint ordering)
- Larger, more diverse clusters, ...
After clustering, you can manually assign meaningful names to each cluster to reflect candidate components or subdomains. This helps turn the automated clustering output into an actionable architecture blueprint.
Example mapping:
{
"CustomerIdentity": ["FR-1", "FR-2", "FR-56"],
"ProductCatalog": ["FR-4", "FR-5", "FR-6"],
"OrderIntake": ["FR-8", "FR-16", "FR-25-32"],
"OtherClusters": []
}- Open the
clusters.jsonfile generated by the pipeline. - Review the functional requirements in each cluster.
- Assign a descriptive component name for each cluster, for example:
CustomerIdentity,OrderIntake,PaymentProcessing. - Replace the automatically generated cluster keys with your chosen names.
- Save this mapping. It can now be used as a reference for designing bounded contexts or modules.
CLI arguments can be used by modifying the docker-compose.yml app service. Use the command: field to override the default script execution. For example:
python fr_clustering.py --help
- --print-fr : Print loaded requirements and exit
- --fr-file PATH : Override requirements file path
- --projection [umap|tsne] : Choose 2D projection method (default: umap)
- --perplexity FLOAT : t-SNE perplexity (default: 30.0)
- --cluster-distance FLOAT : Cosine distance threshold for clustering (default: 0.65)
- --embedding-model [all-MiniLM-L6-v2|all-mpnet-base-v2] : Choose SentenceTransformer model for embeddings (default: all-MiniLM-L6-v2)Usage in docker-compose.yaml:
# command: overrides the default CMD in the Dockerfile, allowing you to specify CLI arguments.
# for example:
command: python fr_clustering.py --projection tsne --perplexity 5 --cluster-distance 0.2 --embedding-model all-mpnet-base-v2💡 Tip for small datasets (< 30 items): Use
--projection tsne --perplexity 5 --cluster-distance 0.2for finer-grained clusters.
| Variable | Default | Description |
|---|---|---|
| QDRANT_HOST | localhost | Qdrant service hostname |
| QDRANT_HTTP_PORT | 6333 | Qdrant HTTP API port |
| FR_FILE | functional_requirements.txt | Path to requirements file |
| LOG_LEVEL | INFO | Logging verbosity (DEBUG, INFO, WARNING) |
See requirements.txt for full list.
See MIT License

