UN WebTV Analysis Platform

AI-powered toolkit for turning United Nations WebTV sessions into structured, research-ready knowledge with automated transcription, entity extraction, analytics, and an interactive chat surface.

Features

UN WebTV ingestion & session catalog: capture metadata from public session URLs and keep analyses searchable.
Transcription with diarization: leverage Azure OpenAI (GPT-4o Transcribe & Whisper) for high-fidelity, speaker-aware transcripts.
Entity & SDG extraction: identify speakers, countries, organizations, themes, treaties, SDGs, sentiment, and key decisions.
Vector-powered semantic search: index transcript segments in Azure AI Search for lightning-fast retrieval.
AI research copilot: RAG-style chat UI grounded in transcript segments with citations and source timestamps.
Analytics & visualizations: Streamlit dashboards surface speaker participation, topic trends, and geographic coverage.
Export & collaboration: download transcripts, summaries, and analysis artifacts to share with research teams.

Installation & Setup

Prerequisites

Python 3.11+
FFmpeg and ffprobe (e.g., brew install ffmpeg on macOS or sudo apt install ffmpeg on Ubuntu)
Azure subscription with access to OpenAI, Speech Services, Cosmos DB, AI Search, and Blob Storage
Git

1. Clone the repository

git clone <your-repo-url>
cd un-webcast-simple

2. Create and activate a virtual environment

python3 -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

3. Install Python dependencies

pip install --upgrade pip
pip install -r requirements.txt

4. Configure environment variables

Create a .env file (or use your secret manager of choice) with the configuration keys expected by config/settings.py. A minimal example:

APP_NAME="UN WebTV Analysis Platform"
AZURE_OPENAI_API_KEY="..."
AZURE_OPENAI_ENDPOINT="https://<your-resource>.openai.azure.com/"
AZURE_OPENAI_DEPLOYMENT_NAME="gpt-4o-unga"
AZURE_TRANSCRIBE_DIARIZE_DEPLOYMENT_NAME="gpt-4o-transcribe-diarize"
AZURE_SPEECH_KEY="..."
AZURE_SPEECH_REGION="eastus2"
COSMOS_ENDPOINT="https://<your-account>.documents.azure.com:443/"
COSMOS_KEY="..."
COSMOS_DATABASE_NAME="untv_analysis"
BLOB_CONNECTION_STRING="DefaultEndpointsProtocol=...;"
BLOB_CONTAINER_AUDIO="audio-temp"
BLOB_CONTAINER_TRANSCRIPTS="transcripts"
SEARCH_ENDPOINT="https://<your-search>.search.windows.net"
SEARCH_API_KEY="..."
SEARCH_INDEX_NAME="untv-segments"

Refer to config/settings.py for the full list of configurable options (deployment names, rate limits, logging paths, etc.).

5. Run the Streamlit application

streamlit run app.py

Optional: if you split the API backend and the UI, expose any FastAPI routes with Uvicorn (e.g., uvicorn backend.api:app --reload) before launching the UI.

Project Structure

un-webcast-simple/
├── app.py                 # Streamlit entry point
├── pages/                 # Additional Streamlit pages (visualizations, catalog, etc.)
├── backend/
│   ├── services/          # Ingestion, audio processing, OpenAI, database helpers
│   ├── models/            # Pydantic data models
│   └── api/               # FastAPI surface (coming soon)
├── config/                # Pydantic settings and configuration helpers
├── scripts/               # Operational scripts (maintenance, utilities)
├── tests/                 # Automated test suite
├── docs/                  # Architecture and deployment docs (extend as needed)
└── requirements.txt       # Python dependencies

Testing & Quality Checks

pytest               # run unit/integration tests
pytest --cov         # include coverage reporting
black .              # format code
flake8               # lint
mypy .               # static type checking

Manual diagnostic scripts for Azure integrations live in scripts/manual/. Run them directly with python scripts/manual/<script_name>.py once your environment is configured.

Documentation

Architecture – system design and processing pipeline
Add API specs, deployment runbooks, and contributor guidelines before public release (see checklist below).

Contributing

Issues and pull requests are welcome. Please open a discussion if you plan significant changes so we can align on direction and Azure resource usage. See CONTRIBUTING.md and follow the Code of Conduct.

License

Distributed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
backend		backend
config		config
data		data
pages		pages
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PROGRESS.md		PROGRESS.md
README.md		README.md
STATUS.md		STATUS.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UN WebTV Analysis Platform

Features

Installation & Setup

Prerequisites

1. Clone the repository

2. Create and activate a virtual environment

3. Install Python dependencies

4. Configure environment variables

5. Run the Streamlit application

Project Structure

Testing & Quality Checks

Documentation

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

MafiAtUN/un-webcast-analyzer

Folders and files

Latest commit

History

Repository files navigation

UN WebTV Analysis Platform

Features

Installation & Setup

Prerequisites

1. Clone the repository

2. Create and activate a virtual environment

3. Install Python dependencies

4. Configure environment variables

5. Run the Streamlit application

Project Structure

Testing & Quality Checks

Documentation

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages