You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
graphify/extract.py is 11,657 lines and contains 45 language extractors in one file. graphify/__main__.py is the same pattern at a smaller scale: a single 2,500-line main() with 40 command branches in one if/elif chain.
This is the largest maintenance risk in the repo. Reviewing a one-language change requires loading the entire file. Test isolation is harder than it should be. New contributors adding a 46th language have to navigate the whole file to find the right place.
Proposed shape
graphify/
extractors/
__init__.py # registry: LANGUAGE_EXTRACTORS = {\"python\": PythonExtractor, ...}
base.py # shared Extractor protocol + common helpers
python.py
javascript.py
typescript.py
...
rust.py
extract.py # thin orchestrator: pick extractor by language, run it
File-level git blame is useless on extract.py; every commit touches the whole file.
A per-language file makes it obvious which languages are well-tested and which aren't.
Risk
Big diff. Not a one-PR job. Suggest doing it language-by-language: extract one extractor, prove the registry works, then move the others in batches.
Scope check
Worth confirming with maintainer before anyone starts -- if there's a deliberate reason for the monolithic shape (e.g. cold-start time, single-file installability), close this and document the reason.
Summary
graphify/extract.pyis 11,657 lines and contains 45 language extractors in one file.graphify/__main__.pyis the same pattern at a smaller scale: a single 2,500-linemain()with 40 command branches in one if/elif chain.This is the largest maintenance risk in the repo. Reviewing a one-language change requires loading the entire file. Test isolation is harder than it should be. New contributors adding a 46th language have to navigate the whole file to find the right place.
Proposed shape
Same pattern for
__main__.py:Why this is worth doing now
__main__.pyalready produced Fix two latent bugs: merge-chunks output and manifest data loss #1207 (a typo'dlen()) -- the kind of bug that's harder to spot in a 2.5k-line function than in a 30-line one.Risk
Big diff. Not a one-PR job. Suggest doing it language-by-language: extract one extractor, prove the registry works, then move the others in batches.
Scope check
Worth confirming with maintainer before anyone starts -- if there's a deliberate reason for the monolithic shape (e.g. cold-start time, single-file installability), close this and document the reason.
Surfaced during an external code review pass.