Add ATIF conversion pipeline#256
Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
|
@OpenHands /iterate |
|
I'm on it! neubig can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
There was a problem hiding this comment.
🟡 Acceptable — the pipeline wires up correctly and all 596 tests pass. Three issues worth addressing before merge.
- is ATIF-in / ATIF-out (normalization only), but the name and the README pipeline diagram () imply it emits ADP std format. This naming mismatch will mislead future contributors. Rename it (or clarify in a comment and the diagram).
- does a wasteful round-trip through at the end; is sufficient since the copy is already a valid .
- The fallback in is a fragile heuristic — see inline comment.
[RISK ASSESSMENT]
- [Overall PR] Risk Assessment: 🟢 LOW — additive pipeline layer, backward-compatible, no existing behavior broken.
VERDICT: ✅ Worth merging with the heuristic fixed.
KEY INSIGHT: delegates to internally, so the ATIF stage currently inherits ADP normalization rather than preserving raw tool shapes — acceptable for now but worth a follow-up if per-dataset ATIF fidelity is ever needed.
This review was generated by an AI agent (OpenHands) on behalf of the user.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/26885141828
Co-authored-by: openhands <openhands@all-hands.dev>
There was a problem hiding this comment.
🟡 Acceptable — The three previously flagged issues are partially addressed: normalize_atif_trajectory no longer re-validates, and the schema_version fallback is now exception-based. Two new issues and one outstanding naming concern are below.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/26885838005
Co-authored-by: openhands <openhands@all-hands.dev>
There was a problem hiding this comment.
🟡 Acceptable — pipeline wires up correctly and previous review items are resolved. One new correctness bug found.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/26886419503
Co-authored-by: openhands <openhands@all-hands.dev>
There was a problem hiding this comment.
🟡 Acceptable — pipeline wires up correctly, all 596 tests pass, and previous review items are resolved. Two new issues below.
This review was generated by an AI agent (OpenHands) on behalf of the user.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/26887170488
Co-authored-by: openhands <openhands@all-hands.dev>
There was a problem hiding this comment.
🟡 Acceptable — pipeline wires up correctly and all 596 tests pass. Two new issues found.
Was this automated review useful? React with 👍 or 👎 to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/26888057010
Co-authored-by: openhands <openhands@all-hands.dev>
There was a problem hiding this comment.
🟡 Acceptable — previous review items are all resolved and all 596 tests pass. One new documentation inconsistency below.
AGENTS.md line 146 — SFT generation command doesn't use the ATIF pipeline
The pipeline diagram (line 30) shows the canonical path as sample_atif.json → scripts/atif_to_std.py → agents/*/std_to_sft.py → sample_sft/, but the "Generate sample files" command at line 146 still reads from sample_std.json directly:
cat datasets/$MY_DATASET/sample_std.json | ... | python agents/openhands_v0/std_to_sft.py ...Both paths produce identical output (since std_to_sft.py auto-detects format), but contributors following AGENTS.md will never exercise the new ATIF-based SFT generation path. Consider updating to:
cat datasets/$MY_DATASET/sample_atif.json | python scripts/json_to_jsonl.py | python scripts/atif_to_std.py | python agents/openhands_v0/std_to_sft.py --is_web=no --api_env=execute_bash | python scripts/jsonl_to_json.py > datasets/$MY_DATASET/sample_sft/openhands_v0.jsonWas this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/26888896141
Co-authored-by: openhands <openhands@all-hands.dev>
There was a problem hiding this comment.
🟡 Acceptable — pipeline wires up correctly, tests pass (596 passed, 97 skipped), and previous review items are resolved. One new issue found.
full_atif.jsonl is not gitignored — script_full.sh (line 9) generates datasets/$MY_DATASET/full_atif.jsonl, but .gitignore only covers full_raw.jsonl, full_std.jsonl, and full_sft.jsonl. A contributor who runs the full pipeline locally could accidentally commit large ATIF JSONL files. Add full_atif.jsonl to .gitignore beside the other full_*.jsonl patterns, and add it to the AGENTS.md "do not commit" list (line 44) alongside the existing full-corpus artifact names.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/26889796205
There was a problem hiding this comment.
🟡 Acceptable — both previously unresolved threads are resolved in the current code, and 602 tests pass. One minor documentation inconsistency found.
Resolved threads (verified in workspace):
agents/openhands_sdk/std_to_sft.py: noagenttuning_os-specific logic remains in the shared converter; correctly moved todatasets/agenttuning_os/std_to_sft.py.AGENTS.mdsample_std.json generation command now correctly routes throughatif_to_std.py.
CONTRIBUTING.md lines 28–39 — "Standardized Schema Components" section not updated
The pipeline diagram and generation commands were updated to ATIF, but this section still describes the primary standardized format as MessageAction/CodeAction/ApiAction/TextObservation/WebObservation. Since sample_std.json is now ATIF, new contributors reading top-down will have incorrect expectations about the schema. Consider updating this section to describe ATIF steps/tool_calls/observations, or add a note that the listed ADP types belong to the legacy raw_to_standardized.py compatibility path.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/27141347753
There was a problem hiding this comment.
🟡 Acceptable — pipeline wires up correctly, both previously-unresolved threads are resolved in the current code, and 602 tests pass. No new correctness issues found.
Previously-unresolved threads (verified in workspace):
agents/openhands_sdk/std_to_sft.py: noagenttuning_os-specific logic in the shared converter; correctly moved todatasets/agenttuning_os/std_to_sft.py.AGENTS.md: generation commands correctly route throughraw_to_atif.py → atif_to_std.py.
Quick spot-checks:
agents/openhands_v0/api.py:execute_ipython_cellrequired/optionalkey typo fix is correct.scripts/atif_input.py: exception-based dispatch is clean; aschema_versionthat is neitherNonenorATIF_SCHEMA_VERSIONroutes toTrajectory(**data), which will raise a clear ADP version-mismatch error.agents/openhands_sdk/condensation_sft.py:json.loadsruns twice on the error path (once insideload_atif_aware_trajectory, once in theexceptblock). Not a correctness bug, but a minor cleanup opportunity.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/27142080328
There was a problem hiding this comment.
🟡 Acceptable — pipeline wires up correctly, all previously-raised issues are resolved, and 602 tests pass. One minor inconsistency found.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/27145041970
| import traceback | ||
| from pathlib import Path | ||
|
|
||
| sys.path.insert(0, str(Path(__file__).resolve().parents[2])) |
There was a problem hiding this comment.
🟡 Suggestion: sys.path.insert added here but not in sibling converters that have the same new import.
agents/agentlab/std_to_sft.py and agents/sweagent/std_to_sft.py both gained from scripts.atif_input import load_trajectory in this PR but have no corresponding sys.path.insert. Either add the same guard to those files for consistency, or remove it here (all commands in AGENTS.md already set PYTHONPATH).
There was a problem hiding this comment.
🟡 Acceptable — all three previously-unresolved threads are verified as resolved in the current code, and 602 tests pass. One minor new observation below.
Verified resolved threads:
agents/openhands_v0/std_to_sft.py:10(sys.path.insert): all four sibling converters (openhands_v0,agentlab,sweagent,openhands_sdk) now include thesys.path.insertcall consistently.agents/openhands_sdk/std_to_sft.py: noagenttuning_os-specific logic in the shared converter.AGENTS.md: generation commands correctly route throughraw_to_atif.py → atif_to_std.py, and the SFT step reads fromsample_std.json(which is now ATIF format).
Minor new observation — scripts/atif_input.py version dispatch:
The dispatch uses exact equality (== ATIF_SCHEMA_VERSION). If a future ATIF version (e.g., "ATIF-v1.8") is introduced, records with the old "ATIF-v1.7" version string fall through to return Trajectory(**data) (ADP path) and fail with a confusing Pydantic ValidationError rather than a clear "unsupported ATIF version" message. The ATIFTrajectory field validator already produces a clean error, so routing any "ATIF-" prefixed schema_version through the ATIF parse path would give a better failure signal. Not a blocker.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/neulab/agent-data-protocol/actions/runs/27145768889
| import sys | ||
| from pathlib import Path | ||
|
|
||
| sys.path.insert(0, str(Path(__file__).resolve().parents[2])) |
There was a problem hiding this comment.
This import style should not be necessary anywhere in the repo
|
@OpenHands since we are now relying on the ATIF schema, we can delete the ADP schema. Do so and adjust all scripts accordingly |
|
I'm on it! neubig can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
|
Addressed the ADP schema removal request in ff580a6. Summary:
Validation:
This PR comment was created by an AI agent (OpenHands) on behalf of the user. |
|
@OpenHands Can raw_to_standardized.py be deleted now given that the method for creating standardized format is now raw_to_atif and atif_to_std? Also, check to make sure that the changes to the resulting sft data are minimal compared to when the sft data was generated from the legacy adp format. |
|
I'm on it! neubig can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
|
Addressed the
SFT comparison against the legacy ADP-generated data on
Changed SFT files observed in the existing ATIF PR output: Validation run:
Note: I first attempted This PR comment was created by an AI agent (OpenHands) on behalf of the user. |
|
Since my last summary, there were no additional code changes or follow-up actions. Final status:
Conciseness:
The work is complete. |
Co-authored-by: openhands <openhands@all-hands.dev>
|
I audited and regenerated the samples in Answer: no, the committed What changed:
SFT audit versus
So the regenerated output is not minimal compared to legacy ADP output. The earlier audit showing 76/86 unchanged was against the pre-regeneration artifacts; after forcing regeneration through the ATIF path, the deltas are much larger. The biggest count changes are in flattened web/AgentLab-style samples and datasets whose ATIF sample now preserves all raw sample rows (for example Validation run:
Dependency note: full This PR comment was created by an AI agent (OpenHands) on behalf of the user. |
Summary
raw_to_atif.pyindependent of the legacy ADP converters; the shared raw converter now maps common raw chat/tool-call shapes directly into ATIF and preserves raw tool names before normalization.atif_to_std.pyentrypoints for ATIF-to-ATIF tool normalization, backed by shared normalization utilities for common shell/Python/code aliases.std_to_sft.pyconverters (OpenHands v0, SWE-agent, AgentLab, OpenHands SDK) to accept ATIF input through a shared loader.Fixes #243.
Tests
PATH="$HOME/.local/bin:$PATH" pre-commit run --all-filespython -m pytest tests/test_atif_schemas.py tests/test_dataset_structure.py tests/test_std_to_sft_conversion.py -q(254 passed, 12 skipped)PATH="$HOME/.local/bin:$PATH" python -m pytest tests/ -q(602 passed, 97 skipped)Notes / design decisions
raw_to_atif.pyno longer shells out toraw_to_standardized.pyor adapts through ADP. It performs a lightweight raw transcript/tool-call projection into ATIF and leaves cross-dataset tool smoothing to the next stage.atif_to_std.pywrapper. The current wrappers share common ATIF normalization logic, so datasets with no special smoothing are effectively no-ops beyond common alias cleanup; future dataset-specific scripts can diverge where needed.atif_to_std.pyemits ATIF-compatible JSONL, not legacy ADPsample_std.jsonrecords.This PR description was updated by an AI agent (OpenHands) on behalf of the user.