Skip to content

feat(extract): add Salesforce Apex extractor (.cls, .trigger)#1159

Closed
mishrajeev wants to merge 1 commit into
Graphify-Labs:v8from
mishrajeev:feature/salesforce-apex-extractor
Closed

feat(extract): add Salesforce Apex extractor (.cls, .trigger)#1159
mishrajeev wants to merge 1 commit into
Graphify-Labs:v8from
mishrajeev:feature/salesforce-apex-extractor

Conversation

@mishrajeev

Copy link
Copy Markdown

Adds regex-based extraction support for Salesforce Apex — a Java-like language used for server-side logic on the Salesforce platform. No tree-sitter-apex grammar exists on PyPI, so this follows the same regex pattern used for Pascal, Razor, and .NET project files.

Extracts from .cls files:

  • Classes (public/global/private, with/without sharing, abstract, virtual)
  • Inner interfaces and enums
  • Methods at all access levels, including annotated methods (@AuraEnabled, @future, @InvocableMethod, @istest)
  • Inheritance: extends → extends edge, implements → implements edge
  • SOQL queries: [SELECT ... FROM SObject] → uses edge to the SObject
  • DML operations: insert/update/delete/upsert/merge/undelete → uses edges

Extracts from .trigger files:

  • Trigger declaration name + the SObject it fires on → uses edge

Changes:

  • graphify/extract.py: add extract_apex(); register .cls/.trigger in _DISPATCH
  • graphify/detect.py: add .cls and .trigger to CODE_EXTENSIONS
  • tests/fixtures/sample.cls: Apex class fixture covering all extracted constructs
  • tests/fixtures/sample.trigger: Apex trigger fixture on Account SObject
  • tests/test_languages.py: 12 new tests (class, enum, interface, method, contains/method relations, SOQL uses edge, DML uses edge, file node, trigger extraction, trigger SObject link, missing-file safety, no-dangling-edges invariant)
  • README.md: add .cls and .trigger to the supported extensions table

All 12 tests pass. No regressions in existing suite.

Adds regex-based extraction support for Salesforce Apex — a Java-like
language used for server-side logic on the Salesforce platform. No
tree-sitter-apex grammar exists on PyPI, so this follows the same
regex pattern used for Pascal, Razor, and .NET project files.

Extracts from .cls files:
- Classes (public/global/private, with/without sharing, abstract, virtual)
- Inner interfaces and enums
- Methods at all access levels, including annotated methods
  (@AuraEnabled, @future, @InvocableMethod, @istest)
- Inheritance: extends → `extends` edge, implements → `implements` edge
- SOQL queries: [SELECT ... FROM SObject] → `uses` edge to the SObject
- DML operations: insert/update/delete/upsert/merge/undelete → `uses` edges

Extracts from .trigger files:
- Trigger declaration name + the SObject it fires on → `uses` edge

Changes:
- graphify/extract.py: add extract_apex(); register .cls/.trigger in _DISPATCH
- graphify/detect.py: add .cls and .trigger to CODE_EXTENSIONS
- tests/fixtures/sample.cls: Apex class fixture covering all extracted constructs
- tests/fixtures/sample.trigger: Apex trigger fixture on Account SObject
- tests/test_languages.py: 12 new tests (class, enum, interface, method,
  contains/method relations, SOQL uses edge, DML uses edge, file node,
  trigger extraction, trigger SObject link, missing-file safety,
  no-dangling-edges invariant)
- README.md: add .cls and .trigger to the supported extensions table

All 12 tests pass. No regressions in existing suite.
safishamsi added a commit that referenced this pull request Jun 7, 2026
… features)

#1118 — prune stale AST nodes on full re-extraction (#1116)
Stamps every AST-extracted node with _origin="ast" in extract(). On a
full rebuild _rebuild_code drops any AST-marked node absent from the
fresh output even when its source file survives, fixing stale symbols.
Backward-compat: marker-less nodes from pre-1118 graphs survive one
cycle then self-heal.

#1110 — stop reading images and PDFs as garbage in headless extract
Images route through per-backend vision payloads (base64/data-URI/bytes
for claude/openai/bedrock); non-vision backends get _strip_pixels for
graceful degradation. PDFs reuse pypdf. 5MB cap, 20-image chunk limit.

#1159 — Salesforce Apex extractor (.cls, .trigger)
Regex-based extractor: classes, interfaces, enums, methods, triggers,
SOQL/DML edges. No new dependency. Dispatched as .cls and .trigger.

#1107 — Azure OpenAI Service backend (--backend azure)
Uses AzureOpenAI SDK client (from existing openai package). Auto-detects
when AZURE_OPENAI_API_KEY + AZURE_OPENAI_ENDPOINT both set. Uses
max_completion_tokens (not deprecated max_tokens).

#1103 — live PostgreSQL introspection (--postgres DSN)
graphify extract --postgres "postgresql://..." introspects tables, views,
routines, and FK relations via information_schema (SERIALIZABLE READ ONLY).
Credentials sanitized on error. New graphify[postgres] extra (psycopg3).

Union-resolved llm.py conflict: Azure functions + bedrock images= param.
Fixed test_image_vision.py mock to accept timeout= kwarg (our #1112).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@safishamsi

Copy link
Copy Markdown
Collaborator

Landed in 7467c1b.

What it adds: Regex-based AST extractor for Salesforce Apex (.cls, .trigger). No tree-sitter grammar exists on PyPI for Apex so this follows the Pascal/Razor pattern. Extracts: classes (including with sharing/without sharing/global), interfaces, enums, methods, triggers, plus INFERRED edges for SOQL sObjects and DML operations.

12 tests pass on real fixtures. No new dependencies.

1910 passed, 0 failures.

@safishamsi safishamsi closed this Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants