Financial Document Analysis System

A comprehensive toolkit for extracting, analyzing, and querying financial data from various document formats with integrated market data analysis and SEC filing integration.

Overview

This system consists of three main components:

Financial Data Extractor: Extracts structured financial data from PDFs, spreadsheets, and scanned documents.
Financial Chatbot: An AI-powered assistant that allows users to query the extracted financial data.
Market Analysis Engine: Fetches real-time market data and SEC filings for comprehensive financial analysis.

Features

Financial Data Extractor

Multi-format Support: Process PDF files (both text-based and scanned) and spreadsheets (CSV, Excel)
Advanced Table Detection: Identifies and extracts tables from documents
OCR Capability: Extracts text from scanned documents
Standardized Output: Normalizes financial terms and metrics
Contextual Extraction: Captures both structured tables and relevant contextual information
Notes & Footnotes: Extracts important notes and footnotes for comprehensive understanding

Financial Chatbot

AI-Powered Queries: Allows natural language questions about financial data
Financial Analysis: Provides insights on financial metrics and ratios
Source Attribution: References which parts of the document answers came from
Vector Search: Uses semantic search to find the most relevant information
Integrated Market Data: Automatically enriches responses with market comparisons
Visualizations: Generates charts and graphs in response to relevant queries
Report Generation: Build and download comprehensive financial analysis reports in PDF or Markdown

Market Analysis & SEC Integration

Automatic Company Detection: Identifies the company from financial documents
Real-time Market Data: Fetches current financial metrics and stock performance
SEC Filing Download: Retrieves official SEC filings (10-K, 10-Q, 8-K, etc.)
Peer Comparison: Compares company performance against industry peers
Interactive Charts: Visual representations of financial metrics and performance
Comprehensive Analysis: Combines document data, market information, and SEC filings

Installation

# Clone the repository
git clone https://github.com/yourusername/financial-document-analysis.git
cd financial-document-analysis

# Install dependencies
pip install -r requirements.txt

# Install additional system dependencies
# For macOS:
brew install ghostscript tesseract

# For Ubuntu/Debian:
apt-get install ghostscript tesseract-ocr

# For Windows:
# Download and install Ghostscript from: https://ghostscript.com/releases/
# Download and install Tesseract from: https://github.com/UB-Mannheim/tesseract/wiki

Usage

Extracting Financial Data

from enhanced_financial_data_extractor import extract_financial_data_rag

# Extract data from a document
result = extract_financial_data_rag("path/to/financial_report.pdf")

# Save the extracted data to JSON
from enhanced_financial_data_extractor import extract_and_save_financial_data_rag
output_path = extract_and_save_financial_data_rag("path/to/financial_report.pdf")
print(f"Extracted data saved to: {output_path}")

Using the Web Interface

# Run the Streamlit web app
streamlit run bot.py

Web Interface Features

Document Upload: Upload financial PDFs directly through the web interface
AI Chat: Ask natural language questions about the financial document
Source Citations: See which parts of the document the AI used to answer
Report Generation: Compile important insights into a structured report
Industry Comparison: Compare company performance with industry peers
SEC Filing Integration: Download and analyze official SEC filings
Interactive Visualizations: View charts and graphs on demand
Data Export: Export extracted financial data as JSON

SEC Edgar Integration

Access official SEC filings directly within the application:

Supported Filing Types: 10-K, 10-Q, 8-K, S-1, DEF 14A, and more
Automatic Processing: Extracts key sections from filings (Business Description, Risk Factors, etc.)
Integrated Analysis: Combines SEC filings with document analysis for comprehensive insights
Historical Filing Access: Retrieve multiple historical filings for trend analysis
Seamless Knowledge Base Integration: SEC data is incorporated into the AI's knowledge base

To use the SEC functionality:

After uploading a financial document, the system automatically detects the company ticker
In the sidebar, select the filing type and number of filings to download
Click "Download SEC Filings" to retrieve and process the data
Ask questions that incorporate both your document and SEC filing information

Report Generation

The system allows you to build comprehensive financial analysis reports:

Add important insights to the report with one click
Automatically categorizes financial information by topic
Preview the report structure directly in the UI
Download the report as a PDF document with proper formatting
Download the report as Markdown for easy editing
Include source citations and page references

Output Structure

The financial data extractor produces a JSON structure with the following sections:

metadata: Information about the source file and extraction process
financial_data: Extracted tables with standardized column names
contextual_text: Relevant text sections from the document
notes: Footnotes and additional context from the document

Market Data Integration

The system seamlessly incorporates market data:

Automatic Company Detection: Identifies the target company from your document
Industry Peer Mapping: Suggests appropriate peer companies for comparison
Live Financial Data: Fetches real-time financial metrics and ratios
Visual Comparisons: Creates charts to visualize relative performance
Stock Performance Analysis: Analyzes stock price trends against industry benchmarks
Custom Peer Selection: Manually adjust company ticker and peer companies

Requirements

Python 3.8+
Streamlit 1.20+
See requirements.txt for detailed dependencies

License

MIT License

Financial Analysis API

This API serves as an interface to the D2K financial analysis backend.

Setup

Install dependencies:
```
pip install -r requirements.txt
```
Run the server:
```
python app.py
```

API Documentation

The API includes Swagger documentation for easy exploration and testing:

Swagger UI: Access interactive API documentation at /api/docs when the server is running
Test Endpoints: Try out API endpoints directly from the Swagger UI
Model Schemas: View request/response schemas for all endpoints
API Descriptions: Get detailed information about each endpoint's functionality

API Endpoints

Health Check

URL: /api/health
Method: GET
Response: Status of the API

Analyze Document

URL: /api/analyze-document
Method: POST
Form Data:
- file: PDF file to analyze
- query (optional): Query to run against the document
Response: Analysis results

Company Data

URL: /api/company-data/<ticker>
Method: GET
URL Parameters: ticker (company stock symbol)
Response: Company information and market data

Document Query

URL: /api/document-query
Method: POST
Form Data:
- file: PDF file to analyze
- query: Query to run against the document
- ticker (optional): Company ticker symbol for additional context
Response: Query results based on document content

Generate Report

URL: /api/generate-report
Method: POST
Query Parameters:
- format (optional): Output format (markdown or pdf, default: markdown)
Request Body: JSON with report data
Response: Generated report in markdown or PDF format

Extract Tickers

URL: /api/extract-tickers
Method: POST
Request Body: JSON with query field
Response: Extracted ticker symbols from query text

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
modules		modules
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
api-requirements.txt		api-requirements.txt
app.py		app.py
backend_api.py		backend_api.py
bot.py		bot.py
company_data_fetcher.py		company_data_fetcher.py
enhanced_financial_data_extractor.py		enhanced_financial_data_extractor.py
output.json		output.json
pdf_extractor.py		pdf_extractor.py
prompts.txt		prompts.txt
requirements.txt		requirements.txt
run_api.py		run_api.py
sec_filing_analyzer.py		sec_filing_analyzer.py
sec_filing_downloader.py		sec_filing_downloader.py
start_api.sh		start_api.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Financial Document Analysis System

Overview

Features

Financial Data Extractor

Financial Chatbot

Market Analysis & SEC Integration

Installation

Usage

Extracting Financial Data

Using the Web Interface

Web Interface Features

SEC Edgar Integration

Report Generation

Output Structure

Market Data Integration

Requirements

License

Financial Analysis API

Setup

API Documentation

API Endpoints

Health Check

Analyze Document

Company Data

Document Query

Generate Report

Extract Tickers

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

HarshXAI/FinityAi-Streamlit-Backend

Folders and files

Latest commit

History

Repository files navigation

Financial Document Analysis System

Overview

Features

Financial Data Extractor

Financial Chatbot

Market Analysis & SEC Integration

Installation

Usage

Extracting Financial Data

Using the Web Interface

Web Interface Features

SEC Edgar Integration

Report Generation

Output Structure

Market Data Integration

Requirements

License

Financial Analysis API

Setup

API Documentation

API Endpoints

Health Check

Analyze Document

Company Data

Document Query

Generate Report

Extract Tickers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages