Skip to content

Calculate ratios#1

Open
psachdev wants to merge 3 commits intomainfrom
calculate_ratios
Open

Calculate ratios#1
psachdev wants to merge 3 commits intomainfrom
calculate_ratios

Conversation

@psachdev
Copy link
Owner

No description provided.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds two new Python scripts to calculate financial margins from SEC 10-K filings. Both scripts fetch company financial data and compute key margins (gross margin, operating margin, net profit margin, and EBITDA margin), but use different approaches:

Changes:

  • Added calculate_margins_xbrl.py which uses SEC's XBRL companyfacts API for deterministic data extraction
  • Added calculate_margins_llm.py which uses HTML parsing combined with DeepSeek LLM for financial statement extraction

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 27 comments.

File Description
calculate_margins_xbrl.py Implements deterministic margin calculation by querying SEC XBRL companyfacts API with fallback concept matching for various financial statement items
calculate_margins_llm.py Implements margin calculation by downloading 10-K HTML, extracting income statement tables, and using DeepSeek LLM to parse the financial data

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +6 to +10
python3 tenk_margins_xbrl.py TICKER [YEAR]

Examples:
python3 tenk_margins_xbrl.py AAPL 2023
python3 tenk_margins_xbrl.py PGR 2024 # insurers should also work
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage comment refers to 'tenk_margins_xbrl.py' but the actual filename is 'calculate_margins_xbrl.py'. This inconsistency will confuse users trying to run the script.

Suggested change
python3 tenk_margins_xbrl.py TICKER [YEAR]
Examples:
python3 tenk_margins_xbrl.py AAPL 2023
python3 tenk_margins_xbrl.py PGR 2024 # insurers should also work
python3 calculate_margins_xbrl.py TICKER [YEAR]
Examples:
python3 calculate_margins_xbrl.py AAPL 2023
python3 calculate_margins_xbrl.py PGR 2024 # insurers should also work

Copilot uses AI. Check for mistakes.

def main():
if len(sys.argv) < 2:
print("Usage: python tenk_margins.py TICKER [YEAR]")
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage example refers to 'tenk_margins.py' but the actual filename is 'calculate_margins_llm.py'. This inconsistency will confuse users trying to run the script.

Suggested change
print("Usage: python tenk_margins.py TICKER [YEAR]")
print("Usage: python calculate_margins_llm.py TICKER [YEAR]")

Copilot uses AI. Check for mistakes.
Send the income statement table to DeepSeek and ask for structured JSON.
Tries to be robust to non‑pure‑JSON responses.
"""
if not DEEPSEEK_API_KEY:
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DEEPSEEK_API_KEY is retrieved from environment variables without validation. If the key is an empty string (rather than None), the check on line 199 will pass but the API call will likely fail. Consider also checking for empty strings: 'if not DEEPSEEK_API_KEY or not DEEPSEEK_API_KEY.strip():'

Suggested change
if not DEEPSEEK_API_KEY:
if not DEEPSEEK_API_KEY or not str(DEEPSEEK_API_KEY).strip():

Copilot uses AI. Check for mistakes.
if require_fy:
fp = f.get("fp")
# 'FY' or sometimes 'Y'
if fp not in ("FY", "Y", None):
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fiscal period filter accepts 'FY', 'Y', or None. Accepting None when require_fy=True is contradictory - if fp is None, we cannot verify it's a full year fiscal period. This could lead to selecting facts from quarterly filings when a full year is required. Consider removing None from the accepted values when require_fy=True, or add a comment explaining why None is acceptable.

Suggested change
if fp not in ("FY", "Y", None):
if fp not in ("FY", "Y"):

Copilot uses AI. Check for mistakes.
Comment on lines +234 to +237
try:
return float(val), concept_name
except Exception:
continue
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare except clauses (except Exception) without logging or specific error handling silently ignore conversion errors. This makes debugging difficult when values cannot be converted to float. Consider logging the error or at least the problematic value before continuing to the next candidate.

Copilot uses AI. Check for mistakes.
'CONSOLIDATED STATEMENTS OF OPERATIONS', 'STATEMENTS OF INCOME', etc.
Return the HTML of that table as a string.
"""
soup = BeautifulSoup(html, "lxml")
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BeautifulSoup parser uses "lxml" which requires the lxml package to be installed. However, this dependency is not documented in the script's docstring or in installation instructions. If lxml is not installed, BeautifulSoup will fall back to a different parser (like html.parser), which could produce different results. Consider either documenting the lxml dependency or explicitly handling the case where lxml is not available.

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +60
def get_ticker_cik_map() -> Dict[str, int]:
"""
Download the official SEC ticker -> CIK mapping JSON and return a dict
{ticker_upper: cik_int}.
"""
tickers_url = "https://www.sec.gov/files/company_tickers.json"

# Use proper headers for www.sec.gov
headers = {
"User-Agent": "MyResearchScript/1.0 your_email@example.com",
"Accept-Encoding": "gzip, deflate",
}

resp = requests.get(tickers_url, headers=headers)
resp.raise_for_status()
data = resp.json()

mapping = {}
# Format: { "0": {"cik_str": 320193, "ticker": "AAPL", "title": "Apple Inc."}, ...}
for _, entry in data.items():
ticker = entry["ticker"].upper()
cik = int(entry["cik_str"])
mapping[ticker] = cik
return mapping


def get_cik_for_ticker(ticker: str) -> int:
mapping = get_ticker_cik_map()
t = ticker.upper()
if t not in mapping:
raise ValueError(f"Ticker {ticker} not found in SEC ticker list.")
return mapping[t]
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions 'get_ticker_cik_map', 'get_cik_for_ticker', and 'get_10k_metadata' are duplicated between 'calculate_margins_xbrl.py' and 'calculate_margins_llm.py'. This code duplication violates the DRY (Don't Repeat Yourself) principle and makes maintenance harder. Consider extracting these common functions into a shared utility module that both scripts can import.

Copilot uses AI. Check for mistakes.
def call_deepseek_for_income_statement(table_text: str) -> Dict[str, Any]:
"""
Send the income statement table to DeepSeek and ask for structured JSON.
Tries to be robust to non‑pure‑JSON responses.
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring uses a non-standard hyphen character (U+2011 NON-BREAKING HYPHEN) in "non‑pure‑JSON" instead of a standard ASCII hyphen. While this doesn't break functionality, it's inconsistent with standard Python coding conventions. Consider using regular ASCII hyphens for consistency.

Suggested change
Tries to be robust to nonpureJSON responses.
Tries to be robust to non-pure-JSON responses.

Copilot uses AI. Check for mistakes.
Comment on lines +266 to +269
OPERATING_INCOME_CONCEPTS = [
"OperatingIncomeLoss",
"IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest",
"IncomeLossFromContinuingOperationsBeforeIncomeTaxesMinorityInterestAndIncomeLossFromEquityMethodInvestments",
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OPERATING_INCOME_CONCEPTS list includes "IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest" and "IncomeLossFromContinuingOperationsBeforeIncomeTaxesMinorityInterestAndIncomeLossFromEquityMethodInvestments" which represent income before taxes, not operating income. These concepts include interest and other non-operating items, making them different from operating income (which excludes interest and non-operating items). Using these as fallbacks for operating income will produce incorrect operating margins. Consider removing these concepts or adding a comment explaining why they are acceptable proxies in certain contexts.

Suggested change
OPERATING_INCOME_CONCEPTS = [
"OperatingIncomeLoss",
"IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest",
"IncomeLossFromContinuingOperationsBeforeIncomeTaxesMinorityInterestAndIncomeLossFromEquityMethodInvestments",
# Only include true operating income concepts here. Pre-tax income concepts
# (which include interest and other non-operating items) are intentionally
# excluded to avoid mis-stating operating margins.
OPERATING_INCOME_CONCEPTS = [
"OperatingIncomeLoss",

Copilot uses AI. Check for mistakes.
Convert an HTML table to a text representation suitable for LLM parsing.
We'll create a simple tab-separated format.
"""
soup = BeautifulSoup(table_html, "lxml")
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BeautifulSoup parser uses "lxml" but this dependency is not documented anywhere. Consider adding a note about required dependencies (requests, beautifulsoup4, lxml) in the module docstring.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant