Skip to content
View Pawansingh3889's full-sized avatar

Highlights

  • Pro

Block or report Pawansingh3889

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Pawansingh3889/README.md

Pawan Singh Kapkoti

Data & Analytics Engineer. I build robust data pipelines, read source code, and ship fixes upstream.

MSc Data Analytics. Building pipelines and dev tools on the side. I believe compliance shouldn't mean spreadsheets and AI shouldn't require the cloud. Yorkshire, UK.

Portfolio


Projects

OpsMind — On-prem AI query tool for manufacturing. docs

  • Ask production questions in plain English, get SQL results in 5 seconds
  • LangGraph multi-step agent (6-node state graph) with 5-stage SQL validation
  • MCP server architecture: database + doc search as decoupled tool servers
  • pgvector + ChromaDB retrieval, runtime-loaded domain docs
  • Gemma 3 12B via Ollama — no data leaves the factory
  • 7 business domains, formal agent specs, ty type checker in CI

Production Analytics Pipeline — Incremental ETL from fish production ERP

  • 15K+ rows daily from 4 SI Integreater tables, validated with Pydantic
  • FastAPI REST API (11 endpoints) + Next.js dashboard + Power BI export
  • Prefect orchestration, Sentry monitoring, Docker + OpenTofu deployment
  • Batch tracking, yield analysis, shelf life management, traceability | 53 tests

UK Crime Pipeline — Police UK API to PostgreSQL and BigQuery. streamlit / looker studio / hugging face

  • 99,675 records, 10 cities, 6 dbt marts (including outcome analysis and YoY trends), 65 tests
  • Declarative data validation + SLO monitoring (freshness, completeness, volume)
  • Polars-based alternative ingestion, pipeline maturity scorecard
  • 3 CI/CD workflows with ty type checker, diskcache + stamina for API resilience

Compliance Dashboard — BRC/HACCP food safety. live / hugging face

  • Full batch traceability from catch area to packed product
  • Real-time temperature monitoring with automatic alerts
  • Allergen matrix (14 EU allergens), weight variance with z-score anomaly detection
  • MCP server (5 compliance tools), NL query for auditors, declarative validation, SLO monitoring

sql-sop — SQL linter on PyPI. pip install sql-sop

  • 18 rules (5 errors, 10 warnings, 3 structural), 55 tests, sqlparse AST parsing
  • Fluent API + structural rules: implicit cross joins, nested subqueries, unused CTEs (v0.3.0)
  • Pre-commit hook + GitHub Action for CI/CD integration

SQL Ops Reviewer — GitHub Action reviewing SQL in PRs

  • Rule-based pre-commit (instant) + AI review in CI (deep)
  • Pairs with sql-sop for two-layer quality n**ForThePeople UK** — UK citizen transparency platform. hugging face
  • 13 council-level dashboards: weather, population, housing, crime, health, transport, education
  • 50+ government schemes directory, essential services links
  • API response caching and validation layer

Open source

I learn tools by reading their source. I reverse-engineered the drt connector architecture, shipped 5 destination connectors, and wrote the official connector tutorial — all merged. Same approach everywhere: read the internals, find the gap, ship the fix.

drt · pandas · ChromaDB · pgcli · ollama · superset · plotly · fpdf2


Stack

Python, SQL, dbt, PostgreSQL, BigQuery, FastAPI, Streamlit, Prefect, LangGraph, Ollama, Docker, Polars, pandas, Pydantic, pytest, GitHub Actions

Pinned Loading

  1. uk-crime-pipeline uk-crime-pipeline Public

    End-to-end pipeline: Police UK API to PostgreSQL + BigQuery. dbt staging/marts, 65 tests, 3 CI/CD workflows, Looker Studio + Streamlit dashboards.

    Python

  2. OpsMind OpsMind Public

    On-prem AI query tool for manufacturing. NL-to-SQL in 5 seconds. LangGraph agent, pgvector + ChromaDB RAG, Gemma 3 12B via Ollama. 19 tables, read-only.

    Python 1

  3. uk-education-attainment uk-education-attainment Public

    ML analysis of UK A-Level attainment gaps by ethnicity, gender & deprivation using DfE data

    Jupyter Notebook

  4. manufacturing-compliance-dashboard manufacturing-compliance-dashboard Public

    BRC/HACCP food safety dashboard. Batch traceability, temperature monitoring, allergen matrix, weight variance. Streamlit + Sentry.

    Python

  5. sql-ops-reviewer sql-ops-reviewer Public

    GitHub Action reviewing SQL in PRs with local AI. Pairs with sql-sop for two-layer quality: rule-based pre-commit + AI review in CI.

    Python

  6. sql-guard sql-guard Public

    Fast rule-based SQL linter on PyPI (sql-sop). 15 rules, 21 tests, 0.08s scans. Pre-commit hook + GitHub Action. 195+ monthly downloads.

    Python