@@ -106,6 +106,7 @@ Flo AI Studio is a modern, intuitive visual editor that allows you to design com
106106 - [ Create an Agent with Structured Output] ( #create-an-agent-with-structured-output )
107107- [ 📝 YAML Configuration] ( #-yaml-configuration )
108108- [ 🔧 Variables System] ( #-variables-system )
109+ - [ 📄 Document Processing] ( #-document-processing )
109110- [ 🛠️ Tools] ( #️-tools )
110111 - [ 🎯 @flo_tool Decorator] ( #-flo_tool-decorator )
111112- [ 🧠 Reasoning Patterns] ( #-reasoning-patterns )
@@ -573,6 +574,271 @@ asyncio.run(variable_validation_example())
573574
574575The variables system makes Flo AI agents highly reusable and configurable, enabling you to create flexible AI workflows that adapt to different contexts and requirements.
575576
577+ # # 📄 Document Processing
578+
579+ Flo AI provides powerful document processing capabilities that allow agents to analyze and work with various document formats. The framework supports PDF and TXT documents with an extensible architecture for easy addition of new formats.
580+
581+ # ## ✨ Key Features
582+
583+ - **📄 Multi-Format Support**: Process PDF and TXT documents seamlessly
584+ - **🔄 Multiple Input Methods**: File paths, bytes data, or base64 encoded content
585+ - **🧠 LLM Integration**: Direct document input to AI agents for analysis
586+ - **⚡ Async Processing**: Efficient document handling with async/await support
587+ - **🔧 Extensible Architecture**: Easy to add support for new document types
588+ - **📊 Rich Metadata**: Extract page counts, processing methods, and document statistics
589+
590+ # ## Basic Document Processing
591+
592+ ` ` ` python
593+ import asyncio
594+ from flo_ai.builder.agent_builder import AgentBuilder
595+ from flo_ai.llm import OpenAI
596+ from flo_ai.models.document import DocumentMessage, DocumentType
597+
598+ async def basic_document_analysis():
599+ # Create document message from file path
600+ document = DocumentMessage(
601+ document_type=DocumentType.PDF,
602+ document_file_path='path/to/your/document.pdf'
603+ )
604+
605+ # Create document analysis agent
606+ agent = (
607+ AgentBuilder()
608+ .with_name('Document Analyzer')
609+ .with_prompt('Analyze the provided document and extract key insights, themes, and important information.')
610+ .with_llm(OpenAI(model='gpt-4o-mini'))
611+ .build()
612+ )
613+
614+ # Process document with agent
615+ result = await agent.run([document])
616+ print(f'Analysis: {result}')
617+
618+ asyncio.run(basic_document_analysis())
619+ ` ` `
620+
621+ # ## Multiple Input Methods
622+
623+ Flo AI supports three ways to provide document content :
624+
625+ # ### 1. File Path (Recommended)
626+ ` ` ` python
627+ document = DocumentMessage(
628+ document_type=DocumentType.PDF,
629+ document_file_path='/path/to/document.pdf'
630+ )
631+ ` ` `
632+
633+ # ### 2. Bytes Data
634+ ` ` ` python
635+ # Read file as bytes
636+ with open('document.pdf', 'rb') as f:
637+ pdf_bytes = f.read()
638+
639+ document = DocumentMessage(
640+ document_type=DocumentType.PDF,
641+ document_bytes=pdf_bytes,
642+ mime_type='application/pdf'
643+ )
644+ ` ` `
645+
646+ # ### 3. Base64 Encoded
647+ ` ` ` python
648+ import base64
649+
650+ # Encode file to base64
651+ with open('document.pdf', 'rb') as f:
652+ pdf_base64 = base64.b64encode(f.read()).decode('utf-8')
653+
654+ document = DocumentMessage(
655+ document_type=DocumentType.PDF,
656+ document_base64=pdf_base64,
657+ mime_type='application/pdf'
658+ )
659+ ` ` `
660+
661+ # ## Document Processing in Workflows
662+
663+ Documents can be seamlessly integrated into Arium workflows :
664+
665+ ` ` ` python
666+ import asyncio
667+ from flo_ai.arium import AriumBuilder
668+ from flo_ai.models.document import DocumentMessage, DocumentType
669+
670+ async def document_workflow():
671+ # Create document message
672+ document = DocumentMessage(
673+ document_type=DocumentType.PDF,
674+ document_file_path='business_report.pdf'
675+ )
676+
677+ # Define workflow YAML
678+ workflow_yaml = """
679+ metadata:
680+ name: document-analysis-workflow
681+ version: 1.0.0
682+ description: "Multi-agent document analysis pipeline"
683+
684+ arium:
685+ agents:
686+ - name: intake_agent
687+ role: "Document Intake Specialist"
688+ job: "Process and assess document content for analysis."
689+ model:
690+ provider: openai
691+ name: gpt-4o-mini
692+
693+ - name: content_analyzer
694+ role: "Content Analyst"
695+ job: "Analyze document content for themes, insights, and key information."
696+ model:
697+ provider: openai
698+ name: gpt-4o-mini
699+
700+ - name: summary_generator
701+ role: "Summary Writer"
702+ job: "Create comprehensive summaries of analyzed content."
703+ model:
704+ provider: openai
705+ name: gpt-4o-mini
706+
707+ workflow:
708+ start: intake_agent
709+ edges:
710+ - from: intake_agent
711+ to: [content_analyzer]
712+ - from: content_analyzer
713+ to: [summary_generator]
714+ end: [summary_generator]
715+ """
716+
717+ # Run workflow with document
718+ result = await (
719+ AriumBuilder()
720+ .from_yaml(yaml_str=workflow_yaml)
721+ .build_and_run([document, 'Analyze this business report and provide insights'])
722+ )
723+
724+ return result
725+
726+ asyncio.run(document_workflow())
727+ ` ` `
728+
729+ # ## Advanced Document Processing
730+
731+ # ### Custom Document Metadata
732+ ` ` ` python
733+ document = DocumentMessage(
734+ document_type=DocumentType.PDF,
735+ document_file_path='report.pdf',
736+ metadata={
737+ 'source': 'quarterly_reports',
738+ 'department': 'finance',
739+ 'priority': 'high',
740+ 'tags': ['financial', 'q4-2024']
741+ }
742+ )
743+ ` ` `
744+
745+ # ### Processing Different Document Types
746+ ` ` ` python
747+ # PDF Document
748+ pdf_doc = DocumentMessage(
749+ document_type=DocumentType.PDF,
750+ document_file_path='presentation.pdf'
751+ )
752+
753+ # Text Document
754+ txt_doc = DocumentMessage(
755+ document_type=DocumentType.TXT,
756+ document_file_path='notes.txt'
757+ )
758+
759+ # Process both with the same agent
760+ agent = AgentBuilder().with_name('Multi-Format Analyzer').build()
761+
762+ pdf_result = await agent.run([pdf_doc])
763+ txt_result = await agent.run([txt_doc])
764+ ` ` `
765+
766+ # ## Document Processing Tools
767+
768+ Create custom tools for document operations :
769+
770+ ` ` ` python
771+ from flo_ai.tool import flo_tool
772+ from flo_ai.models.document import DocumentMessage, DocumentType
773+
774+ @flo_tool(description="Extract key information from documents")
775+ async def extract_document_info(document_path: str, doc_type: str) -> str:
776+ """Extract key information from a document."""
777+ document_type = DocumentType.PDF if doc_type.lower() == 'pdf' else DocumentType.TXT
778+
779+ document = DocumentMessage(
780+ document_type=document_type,
781+ document_file_path=document_path
782+ )
783+
784+ # Use document processing agent
785+ agent = AgentBuilder().with_name('Info Extractor').build()
786+ result = await agent.run([document])
787+
788+ return result
789+
790+ # Use in agent
791+ agent = (
792+ AgentBuilder()
793+ .with_name('Document Processor')
794+ .with_tools([extract_document_info.tool])
795+ .build()
796+ )
797+ ` ` `
798+
799+ # ## Error Handling
800+
801+ ` ` ` python
802+ from flo_ai.utils.document_processor import DocumentProcessingError
803+
804+ try:
805+ document = DocumentMessage(
806+ document_type=DocumentType.PDF,
807+ document_file_path='nonexistent.pdf'
808+ )
809+ result = await agent.run([document])
810+ except DocumentProcessingError as e:
811+ print(f'Document processing failed: {e}')
812+ except FileNotFoundError:
813+ print('Document file not found')
814+ ` ` `
815+
816+ # ## Supported Document Types
817+
818+ | Type | Extension | Description | Processing Method |
819+ |------|-----------|-------------|-------------------|
820+ | PDF | `.pdf` | Portable Document Format | PyMuPDF4LLM (LLM-optimized) |
821+ | TXT | `.txt` | Plain text files | UTF-8 with encoding detection |
822+
823+ # ## Best Practices
824+
825+ 1. **File Validation** : Always check if files exist before processing
826+ 2. **Memory Management** : Use file paths for large documents to avoid memory issues
827+ 3. **Error Handling** : Implement proper error handling for document processing failures
828+ 4. **Metadata** : Add relevant metadata to help agents understand document context
829+ 5. **Format Selection** : Choose the most appropriate input method for your use case
830+
831+ # ## Use Cases
832+
833+ - 📊 **Document Analysis** : Extract insights from reports, papers, and documents
834+ - 📝 **Content Summarization** : Create summaries of long documents
835+ - 🔍 **Information Extraction** : Pull specific data from structured documents
836+ - 📋 **Document Classification** : Categorize documents based on content
837+ - 🤖 **Multi-Agent Workflows** : Process documents through specialized agent pipelines
838+ - 📈 **Business Intelligence** : Analyze business documents for insights and trends
839+
840+ The document processing system makes Flo AI incredibly powerful for real-world applications that need to work with various document formats, enabling sophisticated AI workflows that can understand and process complex document content.
841+
576842# # 🛠️ Tools
577843
578844Create custom tools easily with async support :
@@ -834,6 +1100,7 @@ Check out the `examples/` directory for comprehensive examples:
8341100- ` usage.py` and `usage_claude.py` - Provider-specific examples
8351101- ` vertexai_agent_example.py` - Google VertexAI integration examples
8361102- ` ollama_agent_example.py` - Local Ollama model examples
1103+ - ` document_processing_example.py` - Document processing with PDF and TXT files
8371104
8381105# # 🚀 Advanced Features
8391106
0 commit comments