Skip to content

Conversation

belloibrahv
Copy link
Contributor

The implementation follows the approach of other LLM providers and uses the BEDROCK_API_KEY and BEDROCK_REGION environment variables for authentication.

This resolves issue #1162.

@belloibrahv
Copy link
Contributor Author

@georgeh0 @badmonster0 please help me review this PR

@badmonster0
Copy link
Member

thanks @belloibrahv could you share how you tested - using which example, and what are the results look like?

@belloibrahv
Copy link
Contributor Author

Hi @badmonster0,

Thanks for the feedback. I've updated the pull request to include a new example that demonstrates how to test the AWS Bedrock integration.

Testing Example and Workflow

To answer your question about testing, I've added a new example located at examples/bedrock_llm_extraction. This example is now part of the PR and serves as a live demonstration of the feature.

Workflow

The example demonstrates a real-world use case:

  1. It reads PDF files from a local directory.
  2. It converts them to Markdown.
  3. It uses the new Bedrock LLM integration to extract structured data (ModuleInfo) from the Markdown content.

How to Run the Test

You can run the test by following the instructions in the new examples/bedrock_llm_extraction/README.md file. Here's a summary of the steps:

  1. Set up the environment: Copy the examples/bedrock_llm_extraction/.env.example file to .env and fill in your AWS Bedrock credentials.

    cp examples/bedrock_llm_extraction/.env.example examples/bedrock_llm_extraction/.env
  2. Run the pipeline: From the root of the project, execute:

    pip install -e ./examples/bedrock_llm_extraction
    cocoindex setup examples/bedrock_llm_extraction/main.py
    cocoindex update examples/bedrock_llm_extraction/main.py

Expected Results

After a successful run, a modules_info table will be populated in your Postgres database. You can verify the extracted data with the following SQL query:

SELECT filename, module_info->'title' AS title, module_summary FROM modules_info;

The output should look similar to this:

      filename       |         title          |      module_summary
---------------------+------------------------+--------------------------
 manuals/asyncio.pdf | "asyncio — Asynchronous" | {"num_classes": 0, "num_methods": 0}
 manuals/json.pdf    | "json — JSON encoder"  | {"num_classes": 0, "num_methods": 0}

Current Status

I've fully prepared this example and confirmed that it's ready for verification. However, I'm currently blocked from running the final test myself due to an issue with my AWS account verification, which is preventing me from getting a BEDROCK_API_KEY.

Since the example is now included in the PR, I hope it makes it easy for you to test the changes on your end.

Thanks again for your guidance and support.

This commit adds support for AWS Bedrock for LLM parsing.

The implementation follows the approach of other LLM providers and uses the `BEDROCK_API_KEY` and `BEDROCK_REGION` environment variables for authentication.

This resolves issue cocoindex-io#1162.
@belloibrahv belloibrahv force-pushed the feature/add-aws-bedrock-llm-support branch from ddc60ae to 736b580 Compare October 9, 2025 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants