Convert YouTube travel vlogs into interactive map data and route visualizations.
TrailTag extracts meaningful places, timestamps, and routes from travel videos so viewers and developers can replay journeys on a map, inspect points-of-interest (POIs), and consume concise topic summaries.
- Developers and data engineers who want to automatically convert travel videos into geospatial data
- Frontend or product engineers who need map-ready GeoJSON for visualization
- Contributors evaluating the project on GitHub
- Ingest a YouTube video (ID or URL) or supplied subtitles/metadata
- Extract timestamps, named places and POIs mentioned in subtitles/descriptions
- Geocode place names to coordinates (configurable provider) and assemble routes (LineString)
- Output GeoJSON for routes and points with useful properties (time, label, confidence)
- Provide a FastAPI backend, CLI crew for offline processing, and a browser extension to trigger analyses from the YouTube UI
- Support CrewAI Memory-based caching and asynchronous task status reporting
- Input: a YouTube video ID / URL, or pre-parsed subtitles + timestamps JSON
- Output: asynchronous task_id; final result available as JSON/GeoJSON
- route: LineString
- points: FeatureCollection of Point features
- status: pending | running | done | failed
-
Video fetching and pre-processing
- Download or parse YouTube metadata (title, description, upload date)
- Retrieve subtitles (auto-generated or uploaded) and normalize timestamped text and chapters
- Subtitle detection with user warnings for videos without available subtitles
-
Topic extraction and time-aligned summaries
- Lightweight NLP to extract main topics, key sentences and keywords
- Produce short, time-aligned summaries useful for map popups
- Smart token management with intelligent chunking for long videos
-
POI extraction and geocoding
- Detect place names, addresses and landmarks from subtitles, descriptions and chapters
- Support configurable geocoding providers (e.g. Nominatim, Google Geocoding)
- Return coordinates, provider source and confidence score
- Multi-source data extraction from video descriptions, chapters, and comments
-
Route reconstruction
- Merge time-ordered POIs and detected locations into one or multiple LineStrings
- Include properties such as start_time, end_time, duration and source_video_time on features
-
Backend API and task management
- Submit jobs asynchronously (returns task_id)
- Poll status and download results (JSON/GeoJSON)
- Optional SSE or WebSocket progress updates for real-time UI
- Enhanced state management with persistent job tracking and recovery
-
Memory and persistence
- CrewAI Memory system as primary storage with vector search capabilities
- Native in-memory persistence with enhanced data consistency
- Performance monitoring with Langtrace integration and detailed metrics
-
Browser extension
- A popup UI to request analysis while watching YouTube and view the returned GeoJSON on a map
- Integrates with the backend API to fetch and render GeoJSON layers
- Improved map performance with marker clustering and optimized rendering
- Smart badge system that displays TrailTag availability status on the extension icon at a glance
-
CLI and automation
- A
crew-style CLI to run single-video jobs programmatically - Suitable for CI or scheduled cron jobs
- Built-in persistence with automatic data consistency management
- A
-
POST /api/analyze
- Description: Submit a video analysis job
- Example request body:
{ "video_id": "YOUTUBE_VIDEO_ID", "callback_url": "https://example.com/webhook", "options": {} }- Example response:
{ "task_id": "...", "status": "pending" } -
GET /api/status/{task_id}
- Description: Check job status with detailed progress
- Example response:
{ "task_id": "...", "status": "running", "progress": 75, "phase": "geocoding", "subtitle_availability": "available", "estimated_completion": "2024-01-01T12:30:00Z" } -
GET /api/results/{task_id}
- Description: Download job results (JSON/GeoJSON)
- Returns a GeoJSON FeatureCollection containing route and points
-
GET /api/map/{task_id}.geojson
- Description: Directly fetch a map-ready GeoJSON file
-
GET /health
- Description: Service health check with comprehensive system status
- Example response:
{ "status": "healthy", "memory_system": "operational", "subtitle_detection": "active", "performance_monitoring": "enabled" } -
GET /metrics
- Description: Performance metrics and system statistics
- Example response:
{ "total_jobs_processed": 1247, "average_processing_time": 45.7, "memory_usage_mb": 512, "active_jobs": 3, "uptime_hours": 72, "langtrace_enabled": true } -
GET /api/memory/stats
- Description: CrewAI Memory system statistics
- Returns memory usage, entry counts, and performance metrics
-
POST /api/webhooks
- Description: Webhook endpoint for external notifications
- Supports job completion, error alerts, and system events
-
GET /api/execution/{task_id}
- Description: Detailed task execution information
- Returns execution timeline, agent performance, and debugging data
(Actual endpoints and parameters are implemented in src/api/routes.py — consult code for the canonical contract.)
- Route (LineString) example properties:
{
"type":"Feature",
"geometry":{
"type":"LineString",
"coordinates":[...]
},
"properties":{
"video_id":"abc",
"start_time":"00:01:30",
"end_time":"00:12:45",
"source":"detected"
}
}- POI (Point) example properties:
{
"type":"Feature",
"geometry":{
"type":"Point",
"coordinates":[lng, lat]
},
"properties":{
"title":"Eiffel Tower",
"time":"00:05:22",
"confidence":0.89,
"source":"subtitle"
}
}Prerequisites
- Python 3.11+ (see
pyproject.toml) - Node.js + npm (for the browser extension)
- CrewAI Memory system (included in dependencies)
Start the backend in development mode (uvicorn):
uvicorn src.api.main:app --host 0.0.0.0 --port 8010 --reloadRun the Trailtag crew CLI for a single video:
python -m src.trailtag.main VIDEO_IDDevelop and package the extension:
cd src/extension
npm install
npm test
npm run packageRun tests
- Unit tests:
pytest(Python) orcd src/extension && npm test(Extension) - Integration tests:
uv run pytest tests/integration/test_memory_migration.py -v(Memory system validation) - End-to-end tests:
uv run python run_e2e_tests.py(Complete workflow validation) - Memory system testing:
uv run pytest tests/integration/test_memory_migration.py -v(Memory system validation)
API_HOST(default: 0.0.0.0)API_PORT(default: 8010)OPENAI_API_KEY— for accessing the OpenAI APIGOOGLE_API_KEY— for accessing the Google API
CREW_MEMORY_STORAGE_PATH— CrewAI Memory storage location (default: ./memory_storage)CREW_MEMORY_EMBEDDER_PROVIDER— embedding provider (default: openai)
LANGTRACE_API_KEY— Langtrace API key for performance tracingENABLE_PERFORMANCE_MONITORING— enable/disable monitoring (default: true)
- Small test deployment: a single uvicorn instance with built-in CrewAI Memory
- Production: containerize (Docker), run multiple instances behind a load balancer with persistent storage
- Geocoding providers often have rate limits — use CrewAI Memory caching and provider API keys appropriately
The TrailTag extension now features an intelligent badge system that displays the current YouTube video's TrailTag availability status directly on the extension icon:
| Badge Display | Icon State | Meaning | Description |
|---|---|---|---|
| ✓ Green Badge | Available | TrailTag can analyze this video | Video has available subtitles for location analysis |
| ! Orange Badge | Unavailable | TrailTag cannot analyze this video | Video lacks subtitles or subtitles are unavailable |
| ... Blue Badge | Checking | Checking video status | System is detecting subtitle availability |
| No Badge | Not YouTube | Current page is not a YouTube video | Please use TrailTag on YouTube video pages |
- Real-time Detection: Automatically detects subtitle availability as users browse YouTube videos
- Visual Feedback: Know if TrailTag is available without opening the extension
- Smart Updates: Supports YouTube's single-page app navigation with instant status updates
- Performance Optimized: Uses background scripts to avoid impacting page load speeds
- Multi-language Support: Detects both manual subtitles and auto-captions
- Background Script: Monitors tab changes and YouTube navigation
- Content Script: Detects subtitle availability on the page
- Badge API: Uses Chrome Extension Badge API to update icon status
- State Synchronization: Extension popup and badge status stay in sync
The TrailTag Chrome extension implements a robust state management system for handling video analysis workflows. The system coordinates between extension states, API responses, and UI views to provide a seamless user experience.
| Extension State | UI View | Description | Persistence |
|---|---|---|---|
IDLE |
home-view |
Ready for new video analysis | None |
CHECKING_CACHE |
loading-view |
Checking for existing analysis results | None |
ANALYZING |
analyzing-view |
Analysis in progress, showing progress | Job ID stored |
MAP_READY |
map-view |
Displaying analysis results on map | Results stored |
ERROR |
error-view |
Error state with user-friendly messages | None |
| API Status | API Phase | Extension Response | State Transition |
|---|---|---|---|
pending |
analyzing |
Show progress, start polling | → ANALYZING |
running |
Various phases | Update progress indicators | Stay in ANALYZING |
completed |
completed |
Fetch location data | → MAP_READY or IDLE |
failed |
failed |
Display error message | → ERROR |
Problem: Extension was jumping directly to map-view instead of showing analyzing-view for new videos.
Root Cause: The handleJobCompleted() function treated 404 API error responses as valid location data.
Solution: Added explicit error detection for 404 responses in popup-controller.ts:
// Lines 549-582 in popup-controller.ts
if (locations && typeof locations === "object" && (locations as any).detail) {
const detail = String((locations as any).detail || "");
if (/找不到影片地點資料|not\s*found/i.test(detail)) {
// 404 response - clean state and return to IDLE
changeState(AppState.IDLE, {
videoId: state.videoId,
mapVisualization: null,
jobId: null,
progress: 0,
phase: null,
});
return;
}
}| From State | To State | Trigger | Validation |
|---|---|---|---|
IDLE → CHECKING_CACHE |
User clicks analyze | Video ID exists | |
CHECKING_CACHE → MAP_READY |
Cached data found | Valid location data | |
CHECKING_CACHE → ANALYZING |
New analysis needed | Valid job ID returned | |
ANALYZING → MAP_READY |
Job completed | Valid GeoJSON data | |
ANALYZING → IDLE |
No location data | 404 error handling | |
ANALYZING → ERROR |
Job failed | Error response |
Enable state transition debugging:
console.log("State transition:", oldState, "->", newState, stateData);Monitor key events:
- State changes in
changeState()calls - API responses in
handleJobCompleted() - Chrome storage operations
- Job polling lifecycle
src/api/— Modular FastAPI Backendsrc/api/core/— Core API components (models, logging)src/api/routes/— API endpoints and route handlerssrc/api/middleware/— Middleware (SSE, CORS handling)src/api/services/— Business logic services (CrewAI execution, state management, webhooks)src/api/cache/— Caching system with CrewAI Memory integrationsrc/api/monitoring/— Performance monitoring and observability
src/trailtag/— Enhanced CrewAI Implementationsrc/trailtag/core/— Core system (crew definition, models, observers)src/trailtag/memory/— CrewAI Memory system (manager, progress tracking)src/trailtag/tools/— Categorized Tool Suitesrc/trailtag/tools/data_extraction/— YouTube metadata, chapters, comments, descriptionssrc/trailtag/tools/processing/— Subtitle processing, compression, token managementsrc/trailtag/tools/geocoding/— Geographic coordinate resolution
src/extension/— Restructured Chrome Extensionsrc/extension/src/core/— Core functionality (map rendering, popup control, subtitle detection)src/extension/src/services/— API communication servicessrc/extension/src/utils/— Utility functions and optimization toolssrc/extension/ui/— User interface components and stylessrc/extension/config/— Build and configuration filessrc/extension/tests/— Test suites
tests/— Comprehensive Test Suitestests/integration/— Integration tests (E2E, memory migration validation)- Unit tests distributed across modules
scripts/— Migration and utility scripts
- Missing or low-quality subtitles: the system will fall back to video description or chapter metadata; if nothing is available it may return partial results or mark the job as
needs_human_review. - Long videos: jobs can be chunked and parallelized; caching reduces repeated work.
- Unresolved geocoding: place names that cannot be geocoded are returned as unresolved with the original string for manual review.
- Inputs:
{ video_id: string }or pre-parsed subtitle/timestamps JSON - Outputs:
{ task_id, status }, final results are GeoJSON files available via the API
- Add unit tests in
tests/for core transformation and geocoding logic - Update tests when changing public API behavior
- Pull requests and issues are welcome. Follow the repo's coding style and include tests for new behavior.
- See the
LICENSEfile at the repository root for the project license.
