Data & Analytics Engineer. I build robust data pipelines, read source code, and ship fixes upstream.
MSc Data Analytics. Building pipelines and dev tools on the side. I believe compliance shouldn't mean spreadsheets and AI shouldn't require the cloud. Yorkshire, UK.
OpsMind — On-prem AI query tool for manufacturing. docs
- Ask production questions in plain English, get SQL results in 5 seconds
- LangGraph multi-step agent (6-node state graph) with 5-stage SQL validation
- MCP server architecture: database + doc search as decoupled tool servers
- pgvector + ChromaDB retrieval, runtime-loaded domain docs
- Gemma 3 12B via Ollama — no data leaves the factory
- 7 business domains, formal agent specs, ty type checker in CI
Production Analytics Pipeline — Incremental ETL from fish production ERP
- 15K+ rows daily from 4 SI Integreater tables, validated with Pydantic
- FastAPI REST API (11 endpoints) + Next.js dashboard + Power BI export
- Prefect orchestration, Sentry monitoring, Docker + OpenTofu deployment
- Batch tracking, yield analysis, shelf life management, traceability | 53 tests
UK Crime Pipeline — Police UK API to PostgreSQL and BigQuery. streamlit / looker studio / hugging face
- 99,675 records, 10 cities, 6 dbt marts (including outcome analysis and YoY trends), 65 tests
- Declarative data validation + SLO monitoring (freshness, completeness, volume)
- Polars-based alternative ingestion, pipeline maturity scorecard
- 3 CI/CD workflows with ty type checker, diskcache + stamina for API resilience
Compliance Dashboard — BRC/HACCP food safety. live / hugging face
- Full batch traceability from catch area to packed product
- Real-time temperature monitoring with automatic alerts
- Allergen matrix (14 EU allergens), weight variance with z-score anomaly detection
- MCP server (5 compliance tools), NL query for auditors, declarative validation, SLO monitoring
sql-sop — SQL linter on PyPI. pip install sql-sop
- 18 rules (5 errors, 10 warnings, 3 structural), 55 tests, sqlparse AST parsing
- Fluent API + structural rules: implicit cross joins, nested subqueries, unused CTEs (v0.3.0)
- Pre-commit hook + GitHub Action for CI/CD integration
SQL Ops Reviewer — GitHub Action reviewing SQL in PRs
- Rule-based pre-commit (instant) + AI review in CI (deep)
- Pairs with sql-sop for two-layer quality n**ForThePeople UK** — UK citizen transparency platform. hugging face
- 13 council-level dashboards: weather, population, housing, crime, health, transport, education
- 50+ government schemes directory, essential services links
- API response caching and validation layer
I learn tools by reading their source. I reverse-engineered the drt connector architecture, shipped 5 destination connectors, and wrote the official connector tutorial — all merged. Same approach everywhere: read the internals, find the gap, ship the fix.
drt · pandas · ChromaDB · pgcli · ollama · superset · plotly · fpdf2
Python, SQL, dbt, PostgreSQL, BigQuery, FastAPI, Streamlit, Prefect, LangGraph, Ollama, Docker, Polars, pandas, Pydantic, pytest, GitHub Actions


