Skip to content

[Pyrefly][Github actions] Add Two-pass LLM classification with PR diff attribution for primer classification#2539

Draft
migeed-z wants to merge 4 commits intomainfrom
two-pass-llm-classification
Draft

[Pyrefly][Github actions] Add Two-pass LLM classification with PR diff attribution for primer classification#2539
migeed-z wants to merge 4 commits intomainfrom
two-pass-llm-classification

Conversation

@migeed-z
Copy link
Contributor

@migeed-z migeed-z commented Feb 24, 2026

This is another iteration on our mypy primer classifier work. There are a few bugs and improvements we can make. Specifically

  • The verdict contradicts the message.
    Solution: Separate the concerns. One pass for analyzing the diff and coming up with the message, and then a light weight pass to read the message and determine the verdict.
  • Include PR information to explain how they contributed to those changes
  • linkify and improve formatting for messages. Now we have a table that describes the errors per project, as well as a high level overall comment on next step suggestions

…ifier

Split LLM classification into two passes to fix verdict-reasoning
contradictions (4/26 in PR #2493). Pass 1 produces reasoning and
PR attribution without a verdict. Pass 2 reads the reasoning and
assigns the verdict. This separates code analysis (hard) from
labeling (easy), eliminating cases where the LLM commits to a
verdict early and writes contradictory reasoning.

Also adds --pyrefly-diff CLI flag to include the pyrefly PR code
diff in each LLM call, enabling per-project attribution of which
code change caused errors to appear or disappear.
@meta-cla meta-cla bot added the cla signed label Feb 24, 2026
@migeed-z migeed-z marked this pull request as draft February 24, 2026 22:35
@migeed-z migeed-z changed the title Two-pass LLM classification with PR diff attribution for primer class… [Pyrefly][Github actions] Add Two-pass LLM classification with PR diff attribution for primer classification Feb 24, 2026
@meta-codesync
Copy link

meta-codesync bot commented Feb 24, 2026

@migeed-z has imported this pull request. If you are a Meta employee, you can view this in D94280120.

Restructure format_markdown() to show an overview table with linked
function names and file paths, collapsible detailed analysis, and a
suggested fix section. Add helpers for function-name linkification
and root cause extraction from PR attribution text.
Add --suggest CLI flag, Suggestion/SuggestionResult dataclasses, and
generate_suggestions() LLM client that produces actionable source code
fix suggestions from classification results and the PR diff.
Use a stricter regex (_INTERNAL_FUNCTION_PATTERN) that requires
underscores to distinguish pyrefly internal function names like
check_for_imported_final_reassignment() from common Python method
names like get(), match(), set() that appear in error messages.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant