Omniparser: Historical Message Conversion Uses Incorrect Single-Screenshot Mapping

The Omniparser agent loop incorrectly uses a **single mapping table** (derived from only the most recent screenshot) to convert **all historical `computer_call` messages** to `function_call` format. This ignores UI changes between different screenshots.

### Impact
- Inaccurate historical context for LLM
- Wrong element IDs assigned to past interactions
- Potential task execution failures due to incorrect context
- Debugging difficulties

### Code References

- `/cua/libs/python/agent/agent/loops/omniparser.py`

#### Only processes latest screenshot:
```python
# In predict_step() - line 318-320
last_computer_call_output = get_last_computer_call_output(messages)
if last_computer_call_output:
    image_url = last_computer_call_output.get("output", {}).get("image_url", "")
    # Only processes this single screenshot
```

#### Uses single mapping for all messages:
```python
# Line 340-352
xy2id = {v: k for k, v in id2xy.items()}  # Single mapping from latest screenshot
messages_with_element_ids = []
for i, message in enumerate(messages):
    # ...
    converted = await replace_computer_call_with_function(message, xy2id)  # Same mapping for all
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Omniparser: Historical Message Conversion Uses Incorrect Single-Screenshot Mapping #694

Impact

Code References

Only processes latest screenshot:

Uses single mapping for all messages:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Omniparser: Historical Message Conversion Uses Incorrect Single-Screenshot Mapping #694

Description

Impact

Code References

Only processes latest screenshot:

Uses single mapping for all messages:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions