Skip to content

A FreshService Integration with a front end review area that attempts to merge duplicate tickts within a cosine smilarity threshold.

License

Notifications You must be signed in to change notification settings

Juno445/Similar-Tickets

Repository files navigation

Semantic-Ticket-Search

Detect and automatically merge duplicate Freshservice tickets using state-of-the-art sentence embeddings, FAISS Approximate-Nearest-Neighbours (ANN) search and department-aware filtering with domain-specific heuristics.

The repository contains everything needed to train a similarity index from your historical tickets and serve a low-latency API that can be wired to a Freshservice webhook.


✨ Features

β€’ End-to-end pipeline: data harvest β†’ embedding β†’ ANN index β†’ real-time duplicate detection
β€’ Language-agnostic – powered by Sentence-Transformers
β€’ ⚑ <10 ms query latency with in-memory FAISS
β€’ Hot-reload of new snapshots without downtime
β€’ Smart merge workaround implemented via Freshservice REST API (adds notes + closes duplicate)
β€’ Department-aware duplicate detection – only compares tickets within the same department
β€’ Human-in-the-loop review console with Streamlit UI for manual validation
β€’ Active learning with pair labeling and confidence tracking
β€’ Immutable snapshots with atomic updates and rollback capability
β€’ Prometheus metrics & structured, colourised logging
β€’ 100% reproducible via Docker Compose ‑ no local Python required
β€’ Extensive test-suite (pytest) & modular design for rapid iteration


πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            Trainer (batch)         β”‚
β”‚                                    β”‚
β”‚ 1. Fetch N days of tickets         β”‚
β”‚ 2. Embed unseen tickets            β”‚
β”‚ 3. Build FAISS IP index            β”‚
β”‚ 4. Write immutable snapshot        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚  shared volume (/shared)
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            Serve (FastAPI)         β”‚
β”‚                                    β”‚
β”‚ β€’ /webhook – duplicate check       β”‚
β”‚ β€’ /review – human review UI        β”‚
β”‚ β€’ /candidates – active learning    β”‚
β”‚ β€’ /label – pair feedback           β”‚
β”‚ β€’ /reloadIndex – hot reload        β”‚
β”‚ β€’ /metrics – Prometheus            β”‚
β”‚ β€’ /healthz – readiness             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚       Review Console (Streamlit)   β”‚
β”‚                                    β”‚
β”‚ β€’ Manual duplicate validation      β”‚
β”‚ β€’ Confidence-based filtering       β”‚
β”‚ β€’ Active learning feedback         β”‚
β”‚ β€’ Batch review workflows           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Immutable Snapshots are laid out as:

/shared/
  β”œβ”€ snapshots/2024-06-01T12-00-00Z/
  β”‚    β”œβ”€ vectors.faiss   # binary FAISS index
  β”‚    └─ meta.json       # tickets + metadata
  β”œβ”€ snapshots/2024-06-01T18-30-15Z/
  β”‚    β”œβ”€ vectors.faiss   # newer snapshot
  β”‚    └─ meta.json       # with more tickets
  └─ current  β†’  snapshots/2024-06-01T18-30-15Z/  # symlink flipped atomically

Serve mounts /shared read-only; the symlink ensures zero-downtime upgrades and easy rollbacks.


πŸ“‚ Directory layout (top-level)

Path Purpose
src/ Library code shared by trainer & API
src/similarity/ Core similarity detection & active learning
src/freshservice/ Freshservice API client & merge logic
src/ann_index.py FAISS index loader with hot-reload
src/review_console.py Streamlit-based manual review UI
src/webhook_server.py FastAPI server with all endpoints
trainer/ Batch job entry-point & Dockerfile
serve/ Startup script & Dockerfile for FastAPI
tests/ pytest test-suite
requirements.txt Runtime & training dependencies
dev-requirements.txt Development & testing dependencies
docker-compose.yml One-command deployment

πŸš€ Quick start (Docker Compose)

  1. Copy .env.example β†’ .env and fill at least:
    FS_DOMAIN=acme.freshservice.com
    FS_API_KEY=xxxxxxxxxxxxxxxx
    GROUP_IDS=1234,5678
  2. Run:
    docker compose up --build
    β€’ The trainer runs once, writes the first snapshot and exits.
    β€’ The api container waits until the snapshot is present, then exposes HTTP :8000.

Minimal curl example

curl -X POST http://localhost:8000/webhook \
     -H "Content-Type: application/json" \
     -d '{"ticket_id": 809188}'

Response:

{"merged": true}

βš™οΈ Configuration reference

All options are supplied via environment variables (Docker reads them from .env).

Variable Description Default
FS_DOMAIN Freshservice sub-domain (without https) – (required)
FS_API_KEY / FS_API_KEY_FILE API key or path to file containing the key – (required)
GROUP_IDS / GROUP_ID Comma-separated list of ticket group IDs – (required)
INDEX_PATH Shared volume for snapshots /shared
DAYS_BACK Look-back window for trainer 60
EMBEDDING_MODEL sentence-transformers model name all-MiniLM-L6-v2
SIMILARITY_THRESHOLD Probability cut-off for automatic merge (serve) 0.9
RELOAD_TOKEN Secret header for /reloadIndex (unset)
WEBHOOK_SECRET HMAC secret for webhook authentication (unset)
HF_TOKEN Hugging Face access token (avoids rate limits) (unset)
HF_TOKEN_FILE Path to file containing HF token (unset)
MODEL_CACHE_DIR Directory for caching downloaded models /app/models
FS_PER_PAGE Pagination size when fetching tickets 100
FS_MAX_PAGES Max pages per trainer run 10
UNMERGED_LOG Path for tickets that were not auto-merged /data/unmerged_tickets.log
REVIEW_USER Username for review console attribution anonymous
REVIEW_SIM_LOWER Lower similarity threshold for review candidates 0.65
REVIEW_SIM_UPPER Upper similarity threshold for review candidates 0.90
REVIEW_MAX Maximum candidates to show in review console 100
REVIEW_DAYS_BACK Days to look back for review console tickets 60

πŸ€— Hugging Face Authentication & Model Caching

To avoid rate limiting and enable offline model usage, configure Hugging Face authentication:

Option 1: Environment Variable

# Add to your .env file
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxx

Option 2: Token File

# Store token in a file (useful for Docker secrets)
echo "hf_xxxxxxxxxxxxxxxxxxxxxxxxx" > /path/to/hf_token.txt
HF_TOKEN_FILE=/path/to/hf_token.txt

Model Caching

Models are automatically cached after first download:

# Default cache directory (will be created if it doesn't exist)
MODEL_CACHE_DIR=/app/models

# In Docker, mount this as a volume for persistence:
# volumes:
#   - model_cache:/app/models

Benefits:

  • Avoids rate limits when downloading models from Hugging Face
  • Faster startup after first run (models loaded from cache)
  • Offline usage once models are cached
  • Bandwidth savings in production deployments

πŸ” Webhook Security

When WEBHOOK_SECRET is set, the /webhook endpoint requires HMAC-SHA256 authentication:

# Generate signature
payload='{"ticket_id": 12345}'
signature=$(echo -n "$payload" | openssl dgst -sha256 -hmac "$WEBHOOK_SECRET" -hex | cut -d" " -f2)

# Send authenticated request
curl -X POST http://localhost:8000/webhook \
     -H "Content-Type: application/json" \
     -H "X-Webhook-Signature: $signature" \
     -d "$payload"

If WEBHOOK_SECRET is unset, authentication is disabled (backward compatibility).


πŸ‹οΈβ€β™€οΈ Trainer details

Run manually (outside Docker) with your local Python:

pip install -r requirements.txt
python -m trainer.train_freshservice

β€’ Only new tickets are embedded – the script loads previous metadata for incremental updates.
β€’ After finishing, current symlink is updated atomically.
β€’ Log output is written to console; failures are non-zero exit codes (ideal for CI schedulers).

Adding custom heuristics

Open src/freshservice/freshservice_client.py and tweak _calculate_duplicate_probability() or adjust ignore_subject_phrases in src/config.py.


🌐 Serve / API endpoints

Method Path Body Description
POST /webhook { "ticket_id": 123 } Fetches ticket, compares to ANN index within same department, merges if probability β‰₯ threshold. Returns { "merged": bool }.
GET /review – Human-in-the-loop review UI (HTML page).
GET /candidates ?max_pairs=25 Returns potential duplicate pairs for manual review (JSON).
POST /label { "ticket1_id": "123", "ticket2_id": "456", "is_duplicate": true } Label a ticket pair for active learning.
POST /reloadIndex – Reload snapshot & embedding model. Requires header X-Reload-Token.
GET /healthz – Readiness probe, returns vector count.
GET /metrics – Prometheus/OpenMetrics exposition.

Hot-reload flow

  1. Trainer writes new snapshot.
  2. External process calls /reloadIndex (or rely on watchdog).
  3. Serve clears internal lru-caches and memory-maps the new FAISS file – all in <200 ms.

🏒 Department-Aware Duplicate Detection

The system automatically enforces department boundaries when detecting potential duplicates. This ensures data privacy and more relevant results:

How it works

  • Automatic extraction: Department IDs are extracted from Freshservice API responses (department_id field)
  • Similarity filtering: Only tickets within the same department are compared for similarity
  • Cross-department isolation: Tickets from different departments never influence each other's duplicate detection
  • Backward compatibility: Tickets without department IDs fall back to global comparison

Benefits

  • πŸ”’ Data Privacy: Prevents cross-department information leakage
  • 🎯 Relevant Results: More accurate duplicate detection within organizational boundaries
  • ⚑ Performance: Reduced search space leads to faster similarity calculations
  • πŸ“Š Better Accuracy: Department context improves probability calculations

Configuration

Department filtering is automatic and requires no additional configuration. The system:

  • Reads department_id from Freshservice ticket API responses
  • Gracefully handles tickets with missing or null department IDs
  • Maintains full backward compatibility with existing deployments

πŸ•΅οΈ Human-in-the-Loop Review Console

The system includes a Streamlit-based review console for manual validation of potential duplicates that fall below the automatic merge threshold. This enables continuous improvement through active learning.

Features

  • Confidence-based filtering: Shows tickets with similarity between configurable thresholds (default: 65%-90%)
  • Session persistence: Tracks already-reviewed pairs within a session to avoid duplicates
  • One-click actions: Approve merges or mark as not-duplicates with single button clicks
  • Real-time feedback: Labels are immediately fed back to the training system
  • Batch workflows: Process multiple candidates efficiently in sequence

Running the Review Console

# Install additional dependencies
pip install streamlit

# Set environment variables for the console
export REVIEW_USER="your-name"
export REVIEW_SIM_LOWER="0.65"    # Lower similarity threshold
export REVIEW_SIM_UPPER="0.90"    # Upper similarity threshold  
export REVIEW_MAX="100"           # Max candidates to show
export REVIEW_DAYS_BACK="60"      # Days to look back for tickets

# Launch the console
streamlit run src/review_console.py --server.port 8501

The console will be available at http://localhost:8501 and integrates with the same Freshservice configuration as the main system.

πŸ€– Automatic Ticket Merging

Yes, the system automatically merges tickets when configured with a Freshservice webhook. Here's how it works:

Automatic Merge Process

  1. Webhook Trigger: Freshservice sends a webhook when a new ticket is created
  2. Similarity Analysis: The system compares the new ticket against existing tickets in the same department
  3. Probability Calculation: Uses cosine similarity + domain heuristics (email matching, department context)
    • 85% similarity score + 10% email bonus + 5% department bonus
  4. Automatic Merge: If probability β‰₯ threshold (default 90%), tickets are merged automatically
  5. Merge Implementation: Adds private notes to both tickets and closes the duplicate

Merge Mechanism

Since Freshservice doesn't have a native merge API, the system implements a merge workaround:

  • Adds explanatory private note to the winning (original) ticket
  • Adds reference note to the duplicate ticket pointing to the master
  • Closes the duplicate ticket (status = 5)
  • All future correspondence happens on the winning ticket

Safety Features

  • High Threshold: Default 90% confidence prevents false positives
  • Department Isolation: Only merges tickets within the same department
  • Audit Trail: All merges are logged with detailed probability scores
  • Manual Override: Unmerged tickets (below threshold) are logged for human review

Configuration

SIMILARITY_THRESHOLD=0.9    # 90% confidence required for auto-merge
WEBHOOK_SECRET=your_secret  # Optional webhook authentication

Note: Automatic merging only occurs when tickets are submitted via the /webhook endpoint. The trainer and review console do not perform automatic merges.


πŸ“Š Observability

β€’ Logs – Unified stdlib + Loguru formatter. Colourised in console, JSON-less plain text in files.
β€’ Metrics – Two counters + one histogram exported via prometheus_client; no background threads.


πŸ§ͺ Tests

pip install -r requirements.txt -r dev-requirements.txt
pytest -q

The suite uses pytest + coverage and mocks HTTP calls – no network dependency.


πŸ”„ CI/CD Pipeline

Pipeline Jobs

  1. Test Suite - Runs tests across Python 3.10, 3.11, and 3.12

    • Installs dependencies with pip caching
    • Executes pytest with coverage reporting (75% minimum)
    • Uploads coverage to Codecov
  2. Code Quality - Enforces code standards

    • Runs pre-commit hooks (Black, isort, flake8)
    • Type checking with mypy
    • Security scanning with Bandit
  3. Docker Build - Validates containerization

    • Builds both trainer and serve images
    • Uses Docker layer caching for speed
    • Validates docker-compose configuration
  4. Security Scan - Vulnerability detection

    • Trivy filesystem scanning
    • Results uploaded to GitHub Security tab

Local Development Setup

# Install pre-commit hooks (one-time setup)
pip install pre-commit
pre-commit install

# Run quality checks locally
pre-commit run --all-files

# Run tests with coverage
pytest --cov=src --cov=trainer --cov-report=html

# Type checking
mypy src/ trainer/

# Security scan
bandit -r src/ trainer/

All tools are configured via pyproject.toml for consistency across environments.


🀝 Contributing

  1. Fork & clone, create a branch.
  2. Adhere to PEP-8 and run pytest before pushing.
  3. For new modules write unit tests under tests/.
  4. Open a pull request – CI will run lint + tests.

Feel free to open issues for bugs or feature requests. PRs are warmly welcomed!


πŸ“ License

Apache 2.0 – see LICENSE. Commercial use, distribution and modification are permitted.


❀️ Acknowledgements

β€’ Sentence-Transformers
β€’ FAISS
β€’ FastAPI & Uvicorn
β€’ Loguru

About

A FreshService Integration with a front end review area that attempts to merge duplicate tickts within a cosine smilarity threshold.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published