This project uses:
PRJ912.csv(AIS telemetry)PRJ896.csv(port calls)- Optional docs (NIS2 PDF + public ISPS pages)
It now has three layers:
- Deterministic analytics/forecast (source of truth for counts, congestion, trends)
- Deterministic carbon inventory (TTW pollutants + WTW CO2e, with uncertainty + provenance)
- Optional RAG evidence (representative examples, not numeric truth)
The recommended fully free deployment is:
- run Eagle Eye on your own Mac
- use the full local
data/processedanddata/chroma - expose the current Streamlit UI through a free public tunnel
This is the only realistic way to keep full parity with your local model at zero infrastructure cost.
One-command launcher:
./run_free_public_app.shWhat it does:
- checks Docker Desktop is running
- checks
OPENAI_API_KEY - checks full local assets exist
- builds the Streamlit Docker image
- runs the UI container on port
8501 - opens a free public tunnel (
cloudflaredby default,ngroksupported) - prints the public URL
Required local inputs:
data/processed/arrivals_daily.parquetdata/processed/events.parquetdata/chroma/chroma.sqlite3data/chroma/traffic_metadata_index.csvOPENAI_API_KEYin shell or.env
Important:
- the public URL is temporary
- it only stays live while your Mac is on
- Docker Desktop and the tunnel process must keep running
- this free path does not need Streamlit Cloud, remote Chroma, or hosted bundles
data/chromais mounted read-write so local retrieval provenance works inside Docker
Optional stable URL modes:
NGROK_DOMAIN=<your-domain.ngrok-free.app> ./run_free_public_app.sh(requires ngrok auth + reserved domain)CLOUDFLARE_TUNNEL_TOKEN=... CLOUDFLARE_TUNNEL_HOSTNAME=<host.yourdomain.com> ./run_free_public_app.sh
cd "/Users/praharshchintu/Documents/New project"
python3.12 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txtSet API key (only needed for RAG index + evidence retrieval):
export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"Or place it in a local .env file:
cp .env.example .envPut files in /Users/praharshchintu/Documents/New project/data/:
PRJ912.csvPRJ896.csvCELEX_32022L2555_EN_TXT.pdf(recommended)ILOIMOCodeOfPracticeEnglish.pdf(optional)
For ISPS, prefer public official pages via URLs, not unofficial full-text PDFs.
./run_demo_pipeline.shThis runs, in order:
src.predict.data_prepsrc.kpi.build_kpissrc.carbon.build- destination / ETA / anomaly training
- RAG indexing (if
OPENAI_API_KEYis set)
python -m src.predict.data_prep \
--traffic_csv data/PRJ912.csv \
--traffic_csvs data/PRJ896.csv \
--out_dir data/processedpython -m src.kpi.build_kpis \
--traffic_csv data/PRJ912.csv \
--traffic_csvs data/PRJ896.csv \
--out_dir data/processedOutputs include:
data/processed/arrivals_daily.parquetdata/processed/arrivals_hourly.parquetdata/processed/dwell_time.parquetdata/processed/occupancy_hourly.parquetdata/processed/congestion_daily.parquetdata/processed/kpi_capabilities.json
python -m src.carbon.build \
--processed_dir data/processed \
--out_dir data/processedOutputs include:
data/processed/carbon_segments.parquetdata/processed/carbon_emissions_segment.parquetdata/processed/carbon_emissions_daily_port.parquetdata/processed/carbon_emissions_call.parquetdata/processed/carbon_evidence.parquetdata/processed/carbon_params_version.json
python -m src.predict.train_destination --training_rows data/processed/training_rows.parquet --model_dir models
python -m src.predict.train_eta --training_rows data/processed/training_rows.parquet --model_dir models
python -m src.predict.anomaly --training_rows data/processed/training_rows.parquet --model_dir modelspython -m src.index.build_index \
--traffic_csv data/PRJ912.csv \
--traffic_csvs data/PRJ896.csv \
--persist_dir data/chroma \
--pdf_paths data/CELEX_32022L2555_EN_TXT.pdf data/ILOIMOCodeOfPracticeEnglish.pdf \
--doc_urls https://www.imo.org/en/OurWork/Security/Pages/SOLAS-XI-2%20ISPS-Code.aspx https://www.mpa.gov.sg/web/portal/home/port-of-singapore/operations-and-services/maritime-security/isps-codepython -m src.forecast.backtest --processed_dir data/processed./run_streamlit.shUI:
- Ask-only interface with integrated analytics + forecasting + retrieval evidence trace.
- Includes answer, evidence, confidence, chart, method steps, and retrieval provenance.
Use these exact values in Streamlit Cloud:
- Repository:
Praharsh-Projects/Eagle_Eye - Branch:
main - Main file path:
app/streamlit_app.py
For cloud environments where data/processed is not present, the app auto-falls back to bundled app runtime KPI data in demo_data/processed.
For cloud environments where data/chroma is not present, the app auto-falls back to bundled demo vector index in demo_data/chroma.
To enable vector retrieval evidence (vector_id, chunk_id, distance), set Streamlit secret:
OPENAI_API_KEY = "...".
To run full-scale retrieval on cloud (same behavior as local), connect a remote Chroma service:
VECTOR_DB_MODE = "remote"CHROMA_HOST = "<your-chroma-host>"CHROMA_PORT = "8000"(or your service port)CHROMA_SSL = "true"(for HTTPS services)- Optional:
CHROMA_TENANT,CHROMA_DATABASE,CHROMA_AUTH_TOKEN,CHROMA_AUTH_HEADER
To bootstrap full processed runtime data on cloud from a hosted bundle, set:
APP_PROCESSED_BUNDLE_URL = "https://.../eagle_eye_processed_bundle.tar.gz"- Optional for anomaly/jump detection without retriever:
APP_EVENTS_BUNDLE_URL = "https://.../eagle_eye_events_bundle.tar.gz"- Optional for local-bundle retrieval fallback on hosts with enough disk:
APP_CHROMA_BUNDLE_URL = "https://.../eagle_eye_chroma_bundle.tar.gz"- Or, for large Chroma stores split across multiple hosted files:
APP_CHROMA_MANIFEST_URL = "https://.../eagle_eye_chroma_manifest.json"
Create that bundle locally:
python -m src.utils.package_cloud_bundle \
--processed_dir data/processed \
--out dist/eagle_eye_processed_bundle.tar.gz \
--events_out dist/eagle_eye_events_bundle.tar.gz \
--chroma_dir data/chroma \
--chroma_out dist/eagle_eye_chroma_bundle.tar.gzIndex directly to remote service:
export VECTOR_DB_MODE=remote
export CHROMA_HOST=<your-chroma-host>
export CHROMA_PORT=8000
export CHROMA_SSL=true
python -m src.index.build_index \
--traffic_csv data/PRJ912.csv \
--traffic_csvs data/PRJ896.csv \
--persist_dir data/chromaCloud parity summary:
- Deterministic analytics/forecast parity: bundled in
demo_data/processed, or bootstrap viaAPP_PROCESSED_BUNDLE_URL - Retrieval parity: not realistic on free Streamlit Cloud with the full local vector store
- AIS jump/spoof anomaly parity without retriever: requires
APP_EVENTS_BUNDLE_URLbecause those queries need row-level AIS events - On non-Streamlit hosts with enough disk,
APP_CHROMA_BUNDLE_URLorAPP_CHROMA_MANIFEST_URLcan bootstrap a local full vector store
Daily congestion proxy per port:
arrivals_ratio = arrivals_day / median(arrivals_port)dwell_ratio = median_dwell_day / median(dwell_port)congestion_index = arrivals_ratio + dwell_ratio
If dwell is unavailable, congestion falls back to arrivals-only ratio.
Supported well:
- arrivals volume, busiest day/hour, dwell proxy, congestion proxy, historical-pattern forecasts
- TTW pollutants (
CO2e,NOx,SOx,PM) and WTWCO2ewith confidence + uncertainty intervals
Out of scope (clean refusal):
- berth crane utilization
- gate queue length
- TEU throughput
- yard occupancy from terminal ops systems
How many vessels arrived at LUBECK in March 2022?Is Friday usually busier than Monday at LVVNT?What will congestion look like next Friday at LUBECK?Why was LVVNT congested on 2021-01-01?Any unusual spikes in arrivals at GDANSK in 2021-02?What are TTW emissions at SEGOT in March 2022 for CO2e, NOx, SOx, and PM?Show WTW CO2e emissions at LVVNT between 2022-02-01 and 2022-02-28.
Carbon/emissions outputs are standardized with shared formatting and interpretation helpers:
- Absolute greenhouse-gas values are shown in
tCO2eand auto-scale toktCO2e/MtCO2efor large totals. - Intensity metrics are shown with explicit units such as:
kgCO2e/vessel-calltCO2e/daykgCO2e/hour
- Congestion stays dimensionless and is labelled as
index. - Maritime operational units remain:
- distance =
nautical miles (nm) - speed =
knots (kn) - time =
UTC(24-hour)
- distance =
Carbon result views now include:
- unit-aware metric cards
- relative comparison bar (
Low/Moderate/High/Very High) based on dataset percentiles - chart annotations (
Finding: ...) for highest/lowest/spike/drop/selected period - deterministic
Findingspanel How To Reduce Emissionspanel with 3-5 operational actions tied to the current pattern- strict carbon result-state gating:
COMPUTEDCOMPUTED_ZERONOT_COMPUTABLERETRIEVAL_ONLYFORECAST_ONLYUNSUPPORTED
For non-computable carbon states (NOT_COMPUTABLE, RETRIEVAL_ONLY, FORECAST_ONLY, UNSUPPORTED):
- numeric emissions cards are shown as
N/A(never fake0.00 tCO2e) - percentage deltas and relative level bars are suppressed
- deterministic and retrieved evidence are shown in separate blocks
- findings and recommendations switch to data-quality guidance
Configurable threshold source (single place):
config/config.yaml->carbon.relative_level_percentiles(default[0.25, 0.50, 0.75])
Run unit tests for emissions presentation logic:
python -m unittest discover -s tests -p "test_*.py"- If Chroma fails with Python 3.14, recreate
.venvwith Python 3.12. - If RAG evidence is unavailable, ensure
OPENAI_API_KEYis exported in the same terminal. - If retrieval is disabled on cloud, check Streamlit secrets for
OPENAI_API_KEYandCHROMA_*variables. - If cloud is still on partial coverage, verify whether the sidebar shows
demo_data/processed; if so, either upload the processed bundle or setAPP_PROCESSED_BUNDLE_URL. - If Ask has no deterministic output, run
python -m src.kpi.build_kpis ...first.
Streamlit Cloud is not a good target for the full local model because the local Chroma store is several GB. If you later move beyond the free local deployment, use one of these paths:
- FastAPI on a host with disk
- Streamlit on a host with attached storage
- FastAPI + remote Chroma
./run_api.shAPI endpoints:
GET /healthPOST /ask- Swagger docs at
http://localhost:8000/docs
docker build -t eagle-eye-api -f Dockerfile.api .
docker run --rm -p 8000:8000 \
-e OPENAI_API_KEY="..." \
-e APP_PROCESSED_BUNDLE_URL="https://.../eagle_eye_processed_bundle.tar.gz" \
-e APP_EVENTS_BUNDLE_URL="https://.../eagle_eye_events_bundle.tar.gz" \
-e APP_CHROMA_MANIFEST_URL="https://.../eagle_eye_chroma_manifest.json" \
eagle-eye-apiFastAPI + remote Chroma
- Best when you already operate a Chroma service.
- Set
VECTOR_DB_MODE=remoteplusCHROMA_*variables.
FastAPI + local bundle bootstrap
- Best when you want one deployed service and can attach disk.
- Set:
APP_PROCESSED_BUNDLE_URLAPP_EVENTS_BUNDLE_URLAPP_CHROMA_BUNDLE_URL- or
APP_CHROMA_MANIFEST_URL
- Do not set
VECTOR_DB_MODE=remote.
This repo also includes render.yaml for the optional API deployment path and it now points to Dockerfile.api.
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{
"question": "What will congestion be at LVVNT on Friday, February 20, 2026?",
"top_k_evidence": 5,
"filters": {"port": "LVVNT"}
}'