Skip to content

CodeWithOz/demo-enrichment-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Enrichment Agent

An async LangGraph-based agent that:

  • Extracts named entities from a video transcript
  • Verifies canonical names for each entity using web search context
  • Iteratively replaces entity mentions in the transcript with their canonical names
  • Reviews each replacement and controls looping with a max of 2 passes per entity

Read the associated blog post.

Quick Start

  • Python: 3.13+
  • OS: macOS/Linux/Windows

1) Clone and enter the project

git clone <your-fork-or-repo-url>
cd enrichment-agent

2) Set up the environment with uv

This project uses uv with a lockfile (uv.lock). Install uv and sync deps from the lock:

# Install uv (see https://docs.astral.sh/uv/)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Or: pipx install uv

# From the project root, create/sync the project environment from pyproject + uv.lock
uv sync

Notes:

  • uv sync will create and manage a project-local virtualenv automatically.
  • Dependencies are declared in pyproject.toml and pinned by uv.lock.

3) Configure environment variables

Copy .env.example to .env and fill in values.

4) Run the agent (via uv)

uv run main.py
  • By default, main.py runs the agent with a sample transcript string. Edit main.py to provide your own transcript input as needed.
  • To render a Mermaid PNG diagram of the graph, call DemoEnrichmentAgent.draw_graph() in main.py.

How It Works

The agent is implemented in src/enrichment_agent/agent.py as DemoEnrichmentAgent using LangGraph.

State (AgentState)

  • transcript_text: str — original transcript input
  • extracted_entities: NamedEntities — entities extracted by LLM
  • verified_entities: list[VerifiedEntity] — per-entity canonical names
  • updated_transcript_text: str — the in-progress, replaced transcript
  • replacement_loop_idx: int — the loop counter/index over verified entities (additive)
  • replacement_pass_count: int — attempts count for the current entity

Graph Nodes

  • extractor — Uses an LLM to extract NamedEntity items from transcript_text
  • get_verified_entity_worker — For each NamedEntity, researches and produces a VerifiedEntity with a canonical_name
  • replace_entity — Uses an LLM to replace occurrences of the current entity with the canonical name, updates updated_transcript_text, and increments replacement_pass_count
  • replacement_reviewer — Uses an LLM to check if the current entity has been fully replaced in updated_transcript_text. Controls whether to advance the loop.

Loop Control

  • The graph uses replacement_loop_idx with additive semantics. Nodes return the increment, not the absolute index (e.g., {"replacement_loop_idx": 1} to advance by 1).
  • replacement_reviewer_node() decides if we move to the next entity:
    • Advance when the entity is fully replaced, or when replacement_pass_count >= 2.
    • Reset replacement_pass_count to 0 when advancing or when there are no more entities.
    • This guarantees a maximum of 2 replacement attempts per entity.

Models and Providers

Models are instantiated with LangChain’s init_chat_model and structured output:

  • Extractor LLM: gpt-4o-mini
  • Entity Verifier LLM: gpt-4o-mini
  • Entity Replacer LLM: gpt-4o-mini
  • Replacement Reviewer LLM: gpt-4o-mini

You will need a valid OPENAI_API_KEY for these to work.

Running Details

Entry point: main.py

  • Instantiate DemoEnrichmentAgent
  • Optionally generate a graph image with draw_graph()
  • Provide a transcript string and run asyncio.run(enrichment_agent.start(vid_transcript))
  • The agent prints intermediate logs about extraction, verification, replacement passes, and loop routing. A higher recursion_limit is set for LangGraph execution.

Project Structure

/(repo root)
├── main.py
├── README.md
├── .env.example
├── .env (git-ignored; you create this)
└── src/
    └── enrichment_agent/
        ├── __init__.py
        └── agent.py

Troubleshooting

  • Missing API keys: ensure OPENAI_API_KEY and TAVILY_API_KEY are set.
  • Import errors for LangChain/LangGraph: verify dependencies installed and your venv is active.
  • Graph rendering issues: toggle draw_graph() and ensure your environment supports graph image generation.
  • Infinite loop concerns: the reviewer enforces a 2-pass maximum per entity and resets the counter when advancing.

License

MIT.