Calculate ratios by psachdev · Pull Request #1 · psachdev/sec_data_downloader

psachdev · 2026-01-27T04:09:09Z

No description provided.

Copilot

Pull request overview

This PR adds two new Python scripts to calculate financial margins from SEC 10-K filings. Both scripts fetch company financial data and compute key margins (gross margin, operating margin, net profit margin, and EBITDA margin), but use different approaches:

Changes:

Added calculate_margins_xbrl.py which uses SEC's XBRL companyfacts API for deterministic data extraction
Added calculate_margins_llm.py which uses HTML parsing combined with DeepSeek LLM for financial statement extraction

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 27 comments.

File	Description
calculate_margins_xbrl.py	Implements deterministic margin calculation by querying SEC XBRL companyfacts API with fallback concept matching for various financial statement items
calculate_margins_llm.py	Implements margin calculation by downloading 10-K HTML, extracting income statement tables, and using DeepSeek LLM to parse the financial data

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-05T03:42:00Z

calculate_margins_xbrl.py

+    python3 tenk_margins_xbrl.py TICKER [YEAR]
+
+Examples:
+    python3 tenk_margins_xbrl.py AAPL 2023
+    python3 tenk_margins_xbrl.py PGR 2024   # insurers should also work


The usage comment refers to 'tenk_margins_xbrl.py' but the actual filename is 'calculate_margins_xbrl.py'. This inconsistency will confuse users trying to run the script.

Suggested change

python3 tenk_margins_xbrl.py TICKER [YEAR]

Examples:

python3 tenk_margins_xbrl.py AAPL 2023

python3 tenk_margins_xbrl.py PGR 2024 # insurers should also work

python3 calculate_margins_xbrl.py TICKER [YEAR]

Examples:

python3 calculate_margins_xbrl.py AAPL 2023

python3 calculate_margins_xbrl.py PGR 2024 # insurers should also work

Copilot · 2026-02-05T03:42:00Z

calculate_margins_llm.py

+
+def main():
+    if len(sys.argv) < 2:
+        print("Usage: python tenk_margins.py TICKER [YEAR]")


The usage example refers to 'tenk_margins.py' but the actual filename is 'calculate_margins_llm.py'. This inconsistency will confuse users trying to run the script.

Suggested change

print("Usage: python tenk_margins.py TICKER [YEAR]")

print("Usage: python calculate_margins_llm.py TICKER [YEAR]")

Copilot · 2026-02-05T03:42:00Z

calculate_margins_llm.py

+    Send the income statement table to DeepSeek and ask for structured JSON.
+    Tries to be robust to non‑pure‑JSON responses.
+    """
+    if not DEEPSEEK_API_KEY:


The DEEPSEEK_API_KEY is retrieved from environment variables without validation. If the key is an empty string (rather than None), the check on line 199 will pass but the API call will likely fail. Consider also checking for empty strings: 'if not DEEPSEEK_API_KEY or not DEEPSEEK_API_KEY.strip():'

Suggested change

if not DEEPSEEK_API_KEY:

if not DEEPSEEK_API_KEY or not str(DEEPSEEK_API_KEY).strip():

Copilot · 2026-02-05T03:42:00Z

calculate_margins_xbrl.py

+                if require_fy:
+                    fp = f.get("fp")
+                    # 'FY' or sometimes 'Y'
+                    if fp not in ("FY", "Y", None):


The fiscal period filter accepts 'FY', 'Y', or None. Accepting None when require_fy=True is contradictory - if fp is None, we cannot verify it's a full year fiscal period. This could lead to selecting facts from quarterly filings when a full year is required. Consider removing None from the accepted values when require_fy=True, or add a comment explaining why None is acceptable.

Suggested change

if fp not in ("FY", "Y", None):

if fp not in ("FY", "Y"):

Copilot · 2026-02-05T03:42:01Z

calculate_margins_xbrl.py

+        try:
+            return float(val), concept_name
+        except Exception:
+            continue


Bare except clauses (except Exception) without logging or specific error handling silently ignore conversion errors. This makes debugging difficult when values cannot be converted to float. Consider logging the error or at least the problematic value before continuing to the next candidate.

Copilot · 2026-02-05T03:42:04Z

calculate_margins_llm.py

+    'CONSOLIDATED STATEMENTS OF OPERATIONS', 'STATEMENTS OF INCOME', etc.
+    Return the HTML of that table as a string.
+    """
+    soup = BeautifulSoup(html, "lxml")


The BeautifulSoup parser uses "lxml" which requires the lxml package to be installed. However, this dependency is not documented in the script's docstring or in installation instructions. If lxml is not installed, BeautifulSoup will fall back to a different parser (like html.parser), which could produce different results. Consider either documenting the lxml dependency or explicitly handling the case where lxml is not available.

Copilot · 2026-02-05T03:42:04Z

calculate_margins_llm.py

+def get_ticker_cik_map() -> Dict[str, int]:
+    """
+    Download the official SEC ticker -> CIK mapping JSON and return a dict
+    {ticker_upper: cik_int}.
+    """
+    tickers_url = "https://www.sec.gov/files/company_tickers.json"
+
+    # Use proper headers for www.sec.gov
+    headers = {
+        "User-Agent": "MyResearchScript/1.0 your_email@example.com",
+        "Accept-Encoding": "gzip, deflate",
+    }
+
+    resp = requests.get(tickers_url, headers=headers)
+    resp.raise_for_status()
+    data = resp.json()
+
+    mapping = {}
+    # Format: { "0": {"cik_str": 320193, "ticker": "AAPL", "title": "Apple Inc."}, ...}
+    for _, entry in data.items():
+        ticker = entry["ticker"].upper()
+        cik = int(entry["cik_str"])
+        mapping[ticker] = cik
+    return mapping
+
+
+def get_cik_for_ticker(ticker: str) -> int:
+    mapping = get_ticker_cik_map()
+    t = ticker.upper()
+    if t not in mapping:
+        raise ValueError(f"Ticker {ticker} not found in SEC ticker list.")
+    return mapping[t]


The functions 'get_ticker_cik_map', 'get_cik_for_ticker', and 'get_10k_metadata' are duplicated between 'calculate_margins_xbrl.py' and 'calculate_margins_llm.py'. This code duplication violates the DRY (Don't Repeat Yourself) principle and makes maintenance harder. Consider extracting these common functions into a shared utility module that both scripts can import.

Copilot · 2026-02-05T03:42:04Z

calculate_margins_llm.py

+def call_deepseek_for_income_statement(table_text: str) -> Dict[str, Any]:
+    """
+    Send the income statement table to DeepSeek and ask for structured JSON.
+    Tries to be robust to non‑pure‑JSON responses.


The docstring uses a non-standard hyphen character (U+2011 NON-BREAKING HYPHEN) in "non‑pure‑JSON" instead of a standard ASCII hyphen. While this doesn't break functionality, it's inconsistent with standard Python coding conventions. Consider using regular ASCII hyphens for consistency.

Suggested change

Tries to be robust to non‑pure‑JSON responses.

Tries to be robust to non-pure-JSON responses.

Copilot · 2026-02-05T03:42:05Z

calculate_margins_xbrl.py

+OPERATING_INCOME_CONCEPTS = [
+    "OperatingIncomeLoss",
+    "IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest",
+    "IncomeLossFromContinuingOperationsBeforeIncomeTaxesMinorityInterestAndIncomeLossFromEquityMethodInvestments",


The OPERATING_INCOME_CONCEPTS list includes "IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest" and "IncomeLossFromContinuingOperationsBeforeIncomeTaxesMinorityInterestAndIncomeLossFromEquityMethodInvestments" which represent income before taxes, not operating income. These concepts include interest and other non-operating items, making them different from operating income (which excludes interest and non-operating items). Using these as fallbacks for operating income will produce incorrect operating margins. Consider removing these concepts or adding a comment explaining why they are acceptable proxies in certain contexts.

Suggested change

OPERATING_INCOME_CONCEPTS = [

"OperatingIncomeLoss",

"IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest",

"IncomeLossFromContinuingOperationsBeforeIncomeTaxesMinorityInterestAndIncomeLossFromEquityMethodInvestments",

# Only include true operating income concepts here. Pre-tax income concepts

# (which include interest and other non-operating items) are intentionally

# excluded to avoid mis-stating operating margins.

OPERATING_INCOME_CONCEPTS = [

"OperatingIncomeLoss",

Copilot · 2026-02-05T03:42:05Z

calculate_margins_llm.py

+    Convert an HTML table to a text representation suitable for LLM parsing.
+    We'll create a simple tab-separated format.
+    """
+    soup = BeautifulSoup(table_html, "lxml")


The BeautifulSoup parser uses "lxml" but this dependency is not documented anywhere. Consider adding a note about required dependencies (requests, beautifulsoup4, lxml) in the module docstring.

psachdev added 3 commits January 23, 2026 21:35

Calculate margin from 10-k

8b4870d

Merge branch 'main' into calculate_ratios

7e358cb

Calculate margin from 10-k

bd1188b

psachdev requested a review from Copilot February 5, 2026 03:36

Copilot started reviewing on behalf of psachdev February 5, 2026 03:37 View session

Copilot AI reviewed Feb 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate ratios#1

Calculate ratios#1
psachdev wants to merge 3 commits intomainfrom
calculate_ratios

psachdev commented Jan 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	print("Usage: python tenk_margins.py TICKER [YEAR]")
	print("Usage: python calculate_margins_llm.py TICKER [YEAR]")

	if not DEEPSEEK_API_KEY:
	if not DEEPSEEK_API_KEY or not str(DEEPSEEK_API_KEY).strip():

	Tries to be robust to non‑pure‑JSON responses.
	Tries to be robust to non-pure-JSON responses.

Conversation

psachdev commented Jan 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant