Eagle Eye Congestion + Carbon + Forecast + RAG Evidence

This project uses:

PRJ912.csv (AIS telemetry)
PRJ896.csv (port calls)
Optional docs (NIS2 PDF + public ISPS pages)

It now has three layers:

Deterministic analytics/forecast (source of truth for counts, congestion, trends)
Deterministic carbon inventory (TTW pollutants + WTW CO2e, with uncertainty + provenance)
Optional RAG evidence (representative examples, not numeric truth)

0) Recommended Free Deployment

The recommended fully free deployment is:

run Eagle Eye on your own Mac
use the full local data/processed and data/chroma
expose the current Streamlit UI through a free public tunnel

This is the only realistic way to keep full parity with your local model at zero infrastructure cost.

One-command launcher:

./run_free_public_app.sh

What it does:

checks Docker Desktop is running
checks OPENAI_API_KEY
checks full local assets exist
builds the Streamlit Docker image
runs the UI container on port 8501
opens a free public tunnel (cloudflared by default, ngrok supported)
prints the public URL

Required local inputs:

data/processed/arrivals_daily.parquet
data/processed/events.parquet
data/chroma/chroma.sqlite3
data/chroma/traffic_metadata_index.csv
OPENAI_API_KEY in shell or .env

Important:

the public URL is temporary
it only stays live while your Mac is on
Docker Desktop and the tunnel process must keep running
this free path does not need Streamlit Cloud, remote Chroma, or hosted bundles
data/chroma is mounted read-write so local retrieval provenance works inside Docker

Optional stable URL modes:

NGROK_DOMAIN=<your-domain.ngrok-free.app> ./run_free_public_app.sh (requires ngrok auth + reserved domain)
CLOUDFLARE_TUNNEL_TOKEN=... CLOUDFLARE_TUNNEL_HOSTNAME=<host.yourdomain.com> ./run_free_public_app.sh

1) Mac Setup

cd "/Users/praharshchintu/Documents/New project"
python3.12 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt

Set API key (only needed for RAG index + evidence retrieval):

export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"

Or place it in a local .env file:

cp .env.example .env

2) Data Inputs

Put files in /Users/praharshchintu/Documents/New project/data/:

PRJ912.csv
PRJ896.csv
CELEX_32022L2555_EN_TXT.pdf (recommended)
ILOIMOCodeOfPracticeEnglish.pdf (optional)

For ISPS, prefer public official pages via URLs, not unofficial full-text PDFs.

3) One-Command Pipeline

./run_demo_pipeline.sh

This runs, in order:

src.predict.data_prep
src.kpi.build_kpis
src.carbon.build
destination / ETA / anomaly training
RAG indexing (if OPENAI_API_KEY is set)

4) Manual Commands

Build prediction-ready datasets

python -m src.predict.data_prep \
  --traffic_csv data/PRJ912.csv \
  --traffic_csvs data/PRJ896.csv \
  --out_dir data/processed

Build KPI tables (required for Ask/Forecast tabs)

python -m src.kpi.build_kpis \
  --traffic_csv data/PRJ912.csv \
  --traffic_csvs data/PRJ896.csv \
  --out_dir data/processed

Outputs include:

data/processed/arrivals_daily.parquet
data/processed/arrivals_hourly.parquet
data/processed/dwell_time.parquet
data/processed/occupancy_hourly.parquet
data/processed/congestion_daily.parquet
data/processed/kpi_capabilities.json

Build carbon layer outputs (TTW + WTW + uncertainty + evidence)

python -m src.carbon.build \
  --processed_dir data/processed \
  --out_dir data/processed

Outputs include:

data/processed/carbon_segments.parquet
data/processed/carbon_emissions_segment.parquet
data/processed/carbon_emissions_daily_port.parquet
data/processed/carbon_emissions_call.parquet
data/processed/carbon_evidence.parquet
data/processed/carbon_params_version.json

Train prediction models

python -m src.predict.train_destination --training_rows data/processed/training_rows.parquet --model_dir models
python -m src.predict.train_eta --training_rows data/processed/training_rows.parquet --model_dir models
python -m src.predict.anomaly --training_rows data/processed/training_rows.parquet --model_dir models

Build RAG index (optional but recommended for evidence)

python -m src.index.build_index \
  --traffic_csv data/PRJ912.csv \
  --traffic_csvs data/PRJ896.csv \
  --persist_dir data/chroma \
  --pdf_paths data/CELEX_32022L2555_EN_TXT.pdf data/ILOIMOCodeOfPracticeEnglish.pdf \
  --doc_urls https://www.imo.org/en/OurWork/Security/Pages/SOLAS-XI-2%20ISPS-Code.aspx https://www.mpa.gov.sg/web/portal/home/port-of-singapore/operations-and-services/maritime-security/isps-code

Forecast backtest

python -m src.forecast.backtest --processed_dir data/processed

5) Run Streamlit

./run_streamlit.sh

UI:

Ask-only interface with integrated analytics + forecasting + retrieval evidence trace.
Includes answer, evidence, confidence, chart, method steps, and retrieval provenance.

5.1) Streamlit Cloud Deploy

Use these exact values in Streamlit Cloud:

Repository: Praharsh-Projects/Eagle_Eye
Branch: main
Main file path: app/streamlit_app.py

For cloud environments where data/processed is not present, the app auto-falls back to bundled app runtime KPI data in demo_data/processed. For cloud environments where data/chroma is not present, the app auto-falls back to bundled demo vector index in demo_data/chroma.

To enable vector retrieval evidence (vector_id, chunk_id, distance), set Streamlit secret:

OPENAI_API_KEY = "...".

To run full-scale retrieval on cloud (same behavior as local), connect a remote Chroma service:

VECTOR_DB_MODE = "remote"
CHROMA_HOST = "<your-chroma-host>"
CHROMA_PORT = "8000" (or your service port)
CHROMA_SSL = "true" (for HTTPS services)
Optional: CHROMA_TENANT, CHROMA_DATABASE, CHROMA_AUTH_TOKEN, CHROMA_AUTH_HEADER

To bootstrap full processed runtime data on cloud from a hosted bundle, set:

APP_PROCESSED_BUNDLE_URL = "https://.../eagle_eye_processed_bundle.tar.gz"
Optional for anomaly/jump detection without retriever:
APP_EVENTS_BUNDLE_URL = "https://.../eagle_eye_events_bundle.tar.gz"
Optional for local-bundle retrieval fallback on hosts with enough disk:
APP_CHROMA_BUNDLE_URL = "https://.../eagle_eye_chroma_bundle.tar.gz"
Or, for large Chroma stores split across multiple hosted files:
APP_CHROMA_MANIFEST_URL = "https://.../eagle_eye_chroma_manifest.json"

Create that bundle locally:

python -m src.utils.package_cloud_bundle \
  --processed_dir data/processed \
  --out dist/eagle_eye_processed_bundle.tar.gz \
  --events_out dist/eagle_eye_events_bundle.tar.gz \
  --chroma_dir data/chroma \
  --chroma_out dist/eagle_eye_chroma_bundle.tar.gz

Index directly to remote service:

export VECTOR_DB_MODE=remote
export CHROMA_HOST=<your-chroma-host>
export CHROMA_PORT=8000
export CHROMA_SSL=true
python -m src.index.build_index \
  --traffic_csv data/PRJ912.csv \
  --traffic_csvs data/PRJ896.csv \
  --persist_dir data/chroma

Cloud parity summary:

Deterministic analytics/forecast parity: bundled in demo_data/processed, or bootstrap via APP_PROCESSED_BUNDLE_URL
Retrieval parity: not realistic on free Streamlit Cloud with the full local vector store
AIS jump/spoof anomaly parity without retriever: requires APP_EVENTS_BUNDLE_URL because those queries need row-level AIS events
On non-Streamlit hosts with enough disk, APP_CHROMA_BUNDLE_URL or APP_CHROMA_MANIFEST_URL can bootstrap a local full vector store

6) Congestion Definition (used in code)

Daily congestion proxy per port:

arrivals_ratio = arrivals_day / median(arrivals_port)
dwell_ratio = median_dwell_day / median(dwell_port)
congestion_index = arrivals_ratio + dwell_ratio

If dwell is unavailable, congestion falls back to arrivals-only ratio.

7) Supported vs Unsupported

Supported well:

arrivals volume, busiest day/hour, dwell proxy, congestion proxy, historical-pattern forecasts
TTW pollutants (CO2e, NOx, SOx, PM) and WTW CO2e with confidence + uncertainty intervals

Out of scope (clean refusal):

berth crane utilization
gate queue length
TEU throughput
yard occupancy from terminal ops systems

8) Demo Questions

How many vessels arrived at LUBECK in March 2022?
Is Friday usually busier than Monday at LVVNT?
What will congestion look like next Friday at LUBECK?
Why was LVVNT congested on 2021-01-01?
Any unusual spikes in arrivals at GDANSK in 2021-02?
What are TTW emissions at SEGOT in March 2022 for CO2e, NOx, SOx, and PM?
Show WTW CO2e emissions at LVVNT between 2022-02-01 and 2022-02-28.

9) Carbon Measurement and Decision-Support UX

Carbon/emissions outputs are standardized with shared formatting and interpretation helpers:

Absolute greenhouse-gas values are shown in tCO2e and auto-scale to ktCO2e / MtCO2e for large totals.
Intensity metrics are shown with explicit units such as:
- kgCO2e/vessel-call
- tCO2e/day
- kgCO2e/hour
Congestion stays dimensionless and is labelled as index.
Maritime operational units remain:
- distance = nautical miles (nm)
- speed = knots (kn)
- time = UTC (24-hour)

Carbon result views now include:

unit-aware metric cards
relative comparison bar (Low/Moderate/High/Very High) based on dataset percentiles
chart annotations (Finding: ...) for highest/lowest/spike/drop/selected period
deterministic Findings panel
How To Reduce Emissions panel with 3-5 operational actions tied to the current pattern
strict carbon result-state gating:
- COMPUTED
- COMPUTED_ZERO
- NOT_COMPUTABLE
- RETRIEVAL_ONLY
- FORECAST_ONLY
- UNSUPPORTED

For non-computable carbon states (NOT_COMPUTABLE, RETRIEVAL_ONLY, FORECAST_ONLY, UNSUPPORTED):

numeric emissions cards are shown as N/A (never fake 0.00 tCO2e)
percentage deltas and relative level bars are suppressed
deterministic and retrieved evidence are shown in separate blocks
findings and recommendations switch to data-quality guidance

Configurable threshold source (single place):

config/config.yaml -> carbon.relative_level_percentiles (default [0.25, 0.50, 0.75])

10) Tests

Run unit tests for emissions presentation logic:

python -m unittest discover -s tests -p "test_*.py"

11) Troubleshooting

If Chroma fails with Python 3.14, recreate .venv with Python 3.12.
If RAG evidence is unavailable, ensure OPENAI_API_KEY is exported in the same terminal.
If retrieval is disabled on cloud, check Streamlit secrets for OPENAI_API_KEY and CHROMA_* variables.
If cloud is still on partial coverage, verify whether the sidebar shows demo_data/processed; if so, either upload the processed bundle or set APP_PROCESSED_BUNDLE_URL.
If Ask has no deterministic output, run python -m src.kpi.build_kpis ... first.

12) Optional Hosted Deployment Alternatives

Streamlit Cloud is not a good target for the full local model because the local Chroma store is several GB. If you later move beyond the free local deployment, use one of these paths:

FastAPI on a host with disk
Streamlit on a host with attached storage
FastAPI + remote Chroma

Run locally

./run_api.sh

API endpoints:

GET /health
POST /ask
Swagger docs at http://localhost:8000/docs

Docker run (API path)

docker build -t eagle-eye-api -f Dockerfile.api .
docker run --rm -p 8000:8000 \
  -e OPENAI_API_KEY="..." \
  -e APP_PROCESSED_BUNDLE_URL="https://.../eagle_eye_processed_bundle.tar.gz" \
  -e APP_EVENTS_BUNDLE_URL="https://.../eagle_eye_events_bundle.tar.gz" \
  -e APP_CHROMA_MANIFEST_URL="https://.../eagle_eye_chroma_manifest.json" \
  eagle-eye-api

Recommended production modes

FastAPI + remote Chroma

Best when you already operate a Chroma service.
Set VECTOR_DB_MODE=remote plus CHROMA_* variables.

FastAPI + local bundle bootstrap

Best when you want one deployed service and can attach disk.
Set:
- APP_PROCESSED_BUNDLE_URL
- APP_EVENTS_BUNDLE_URL
- APP_CHROMA_BUNDLE_URL
- or APP_CHROMA_MANIFEST_URL
Do not set VECTOR_DB_MODE=remote.

This repo also includes render.yaml for the optional API deployment path and it now points to Dockerfile.api.

Example request

curl -X POST http://localhost:8000/ask \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What will congestion be at LVVNT on Friday, February 20, 2026?",
    "top_k_evidence": 5,
    "filters": {"port": "LVVNT"}
  }'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Eagle Eye Congestion + Carbon + Forecast + RAG Evidence

0) Recommended Free Deployment

1) Mac Setup

2) Data Inputs

3) One-Command Pipeline

4) Manual Commands

Build prediction-ready datasets

Build KPI tables (required for Ask/Forecast tabs)

Build carbon layer outputs (TTW + WTW + uncertainty + evidence)

Train prediction models

Build RAG index (optional but recommended for evidence)

Forecast backtest

5) Run Streamlit

5.1) Streamlit Cloud Deploy

6) Congestion Definition (used in code)

7) Supported vs Unsupported

8) Demo Questions

9) Carbon Measurement and Decision-Support UX

10) Tests

11) Troubleshooting

12) Optional Hosted Deployment Alternatives

Run locally

Docker run (API path)

Recommended production modes

Example request

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
app		app
config		config
data		data
demo_data		demo_data
docs/carbon_deployment_pack		docs/carbon_deployment_pack
eval		eval
evaluation/thesis		evaluation/thesis
models		models
review		review
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CONFERENCE_REPORT_GUIDE.md		CONFERENCE_REPORT_GUIDE.md
Dockerfile		Dockerfile
Dockerfile.api		Dockerfile.api
README.md		README.md
THESIS_BLUEPRINT.md		THESIS_BLUEPRINT.md
THESIS_README.md		THESIS_README.md
render.yaml		render.yaml
requirements.txt		requirements.txt
run_api.sh		run_api.sh
run_demo_pipeline.sh		run_demo_pipeline.sh
run_eagle_eye_public_watchdog.sh		run_eagle_eye_public_watchdog.sh
run_free_public_app.sh		run_free_public_app.sh
run_streamlit.sh		run_streamlit.sh
run_thesis_pipeline.sh		run_thesis_pipeline.sh
run_ui_review.sh		run_ui_review.sh

Folders and files

Latest commit

History

Repository files navigation

Eagle Eye Congestion + Carbon + Forecast + RAG Evidence

0) Recommended Free Deployment

1) Mac Setup

2) Data Inputs

3) One-Command Pipeline

4) Manual Commands

Build prediction-ready datasets

Build KPI tables (required for Ask/Forecast tabs)

Build carbon layer outputs (TTW + WTW + uncertainty + evidence)

Train prediction models

Build RAG index (optional but recommended for evidence)

Forecast backtest

5) Run Streamlit

5.1) Streamlit Cloud Deploy

6) Congestion Definition (used in code)

7) Supported vs Unsupported

8) Demo Questions

9) Carbon Measurement and Decision-Support UX

10) Tests

11) Troubleshooting

12) Optional Hosted Deployment Alternatives

Run locally

Docker run (API path)

Recommended production modes

Example request

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages