Skip to content

HarshXAI/FinityAi-Streamlit-Backend

Repository files navigation

Financial Document Analysis System

A comprehensive toolkit for extracting, analyzing, and querying financial data from various document formats with integrated market data analysis and SEC filing integration.

Overview

This system consists of three main components:

  1. Financial Data Extractor: Extracts structured financial data from PDFs, spreadsheets, and scanned documents.
  2. Financial Chatbot: An AI-powered assistant that allows users to query the extracted financial data.
  3. Market Analysis Engine: Fetches real-time market data and SEC filings for comprehensive financial analysis.

Features

Financial Data Extractor

  • Multi-format Support: Process PDF files (both text-based and scanned) and spreadsheets (CSV, Excel)
  • Advanced Table Detection: Identifies and extracts tables from documents
  • OCR Capability: Extracts text from scanned documents
  • Standardized Output: Normalizes financial terms and metrics
  • Contextual Extraction: Captures both structured tables and relevant contextual information
  • Notes & Footnotes: Extracts important notes and footnotes for comprehensive understanding

Financial Chatbot

  • AI-Powered Queries: Allows natural language questions about financial data
  • Financial Analysis: Provides insights on financial metrics and ratios
  • Source Attribution: References which parts of the document answers came from
  • Vector Search: Uses semantic search to find the most relevant information
  • Integrated Market Data: Automatically enriches responses with market comparisons
  • Visualizations: Generates charts and graphs in response to relevant queries
  • Report Generation: Build and download comprehensive financial analysis reports in PDF or Markdown

Market Analysis & SEC Integration

  • Automatic Company Detection: Identifies the company from financial documents
  • Real-time Market Data: Fetches current financial metrics and stock performance
  • SEC Filing Download: Retrieves official SEC filings (10-K, 10-Q, 8-K, etc.)
  • Peer Comparison: Compares company performance against industry peers
  • Interactive Charts: Visual representations of financial metrics and performance
  • Comprehensive Analysis: Combines document data, market information, and SEC filings

Installation

# Clone the repository
git clone https://github.com/yourusername/financial-document-analysis.git
cd financial-document-analysis

# Install dependencies
pip install -r requirements.txt

# Install additional system dependencies
# For macOS:
brew install ghostscript tesseract

# For Ubuntu/Debian:
apt-get install ghostscript tesseract-ocr

# For Windows:
# Download and install Ghostscript from: https://ghostscript.com/releases/
# Download and install Tesseract from: https://github.com/UB-Mannheim/tesseract/wiki

Usage

Extracting Financial Data

from enhanced_financial_data_extractor import extract_financial_data_rag

# Extract data from a document
result = extract_financial_data_rag("path/to/financial_report.pdf")

# Save the extracted data to JSON
from enhanced_financial_data_extractor import extract_and_save_financial_data_rag
output_path = extract_and_save_financial_data_rag("path/to/financial_report.pdf")
print(f"Extracted data saved to: {output_path}")

Using the Web Interface

# Run the Streamlit web app
streamlit run bot.py

Web Interface Features

  1. Document Upload: Upload financial PDFs directly through the web interface
  2. AI Chat: Ask natural language questions about the financial document
  3. Source Citations: See which parts of the document the AI used to answer
  4. Report Generation: Compile important insights into a structured report
  5. Industry Comparison: Compare company performance with industry peers
  6. SEC Filing Integration: Download and analyze official SEC filings
  7. Interactive Visualizations: View charts and graphs on demand
  8. Data Export: Export extracted financial data as JSON

SEC Edgar Integration

Access official SEC filings directly within the application:

  • Supported Filing Types: 10-K, 10-Q, 8-K, S-1, DEF 14A, and more
  • Automatic Processing: Extracts key sections from filings (Business Description, Risk Factors, etc.)
  • Integrated Analysis: Combines SEC filings with document analysis for comprehensive insights
  • Historical Filing Access: Retrieve multiple historical filings for trend analysis
  • Seamless Knowledge Base Integration: SEC data is incorporated into the AI's knowledge base

To use the SEC functionality:

  1. After uploading a financial document, the system automatically detects the company ticker
  2. In the sidebar, select the filing type and number of filings to download
  3. Click "Download SEC Filings" to retrieve and process the data
  4. Ask questions that incorporate both your document and SEC filing information

Report Generation

The system allows you to build comprehensive financial analysis reports:

  • Add important insights to the report with one click
  • Automatically categorizes financial information by topic
  • Preview the report structure directly in the UI
  • Download the report as a PDF document with proper formatting
  • Download the report as Markdown for easy editing
  • Include source citations and page references

Output Structure

The financial data extractor produces a JSON structure with the following sections:

  • metadata: Information about the source file and extraction process
  • financial_data: Extracted tables with standardized column names
  • contextual_text: Relevant text sections from the document
  • notes: Footnotes and additional context from the document

Market Data Integration

The system seamlessly incorporates market data:

  • Automatic Company Detection: Identifies the target company from your document
  • Industry Peer Mapping: Suggests appropriate peer companies for comparison
  • Live Financial Data: Fetches real-time financial metrics and ratios
  • Visual Comparisons: Creates charts to visualize relative performance
  • Stock Performance Analysis: Analyzes stock price trends against industry benchmarks
  • Custom Peer Selection: Manually adjust company ticker and peer companies

Requirements

  • Python 3.8+
  • Streamlit 1.20+
  • See requirements.txt for detailed dependencies

License

MIT License

Financial Analysis API

This API serves as an interface to the D2K financial analysis backend.

Setup

  1. Install dependencies:

    pip install -r requirements.txt
    
  2. Run the server:

    python app.py
    

API Documentation

The API includes Swagger documentation for easy exploration and testing:

  • Swagger UI: Access interactive API documentation at /api/docs when the server is running
  • Test Endpoints: Try out API endpoints directly from the Swagger UI
  • Model Schemas: View request/response schemas for all endpoints
  • API Descriptions: Get detailed information about each endpoint's functionality

API Endpoints

Health Check

  • URL: /api/health
  • Method: GET
  • Response: Status of the API

Analyze Document

  • URL: /api/analyze-document
  • Method: POST
  • Form Data:
    • file: PDF file to analyze
    • query (optional): Query to run against the document
  • Response: Analysis results

Company Data

  • URL: /api/company-data/<ticker>
  • Method: GET
  • URL Parameters: ticker (company stock symbol)
  • Response: Company information and market data

Document Query

  • URL: /api/document-query
  • Method: POST
  • Form Data:
    • file: PDF file to analyze
    • query: Query to run against the document
    • ticker (optional): Company ticker symbol for additional context
  • Response: Query results based on document content

Generate Report

  • URL: /api/generate-report
  • Method: POST
  • Query Parameters:
    • format (optional): Output format (markdown or pdf, default: markdown)
  • Request Body: JSON with report data
  • Response: Generated report in markdown or PDF format

Extract Tickers

  • URL: /api/extract-tickers
  • Method: POST
  • Request Body: JSON with query field
  • Response: Extracted ticker symbols from query text

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published