Add mouse position into env observation by ryanhoangt · Pull Request #282 · ServiceNow/BrowserGym

ryanhoangt · 2024-11-28T15:22:10Z

Hi, thanks for the project! I'm trying to implement and experiment with coordinate-based actions from browsergym and it would be useful if the environment exposes this info via the observation. Not sure what the team thinks about this?

One quirk is seems like there're no direct ways to get the mouse position from Playwright so I use a kinda hacky way to get that info.

gasse

Nice feature! See my comments to make it robust to iFrames

browsergym/core/src/browsergym/core/env.py

gasse · 2024-12-03T18:38:14Z

browsergym/core/src/browsergym/core/env.py

 window.addEventListener("load", () => {window.browsergym_page_activated();}, {capture: true});
 window.addEventListener("pageshow", () => {window.browsergym_page_activated();}, {capture: true});
-window.addEventListener("mousemove", () => {window.browsergym_page_activated();}, {capture: true});
+window.addEventListener("mousemove", (event) => {window.browsergym_page_activated(); window.pageX = event.clientX; window.pageY = event.clientY;}, {capture: true});


Clean and simple, I like this

gasse · 2024-12-03T18:47:34Z

browsergym/core/src/browsergym/core/observation.py

+    Returns:
+        An array of the x and y coordinates of the mouse location.
+    """
+    position = page.evaluate("""() => {


This will work for simple pages, but I'm worried about iframes. Here is something that could work:

in the JS callback (mousemove), record the position in JS in the window object, and also record which page / frame received this event, in Python with a method similar to _activate_page_from_js().

to extract the mouse position in the browser viewport, take the latest mouse position (last iframe that received a mousemove event), and work your way up the frame hierarchy to reconstruct the current mouse position. See how we do that to get the coordinates of all elements in all iframes here:

https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/core/src/browsergym/core/observation.py#L293-L377

gasse · 2024-12-03T18:49:05Z

tests/core/test_actions_highlevel.py


    obs, reward, term, trunc, info = env.step(action)
    checkbox = get_checkbox_elem(obs)
+    assert obs['mouse_position'] == [x, y]


That's a good test, can you do the same for other pages which have iFrames, and check you get the correct coordinates when clicking on elements inside the iframe? (clicking with coordinates, and with bid)

gasse · 2024-12-03T19:01:10Z

BTW, a cool way to try this feature is to run an openended agent on a whiteboard and ask it to draw simple forms, like we did for the demo video here
https://github.com/ServiceNow/BrowserGym/

gasse · 2024-12-03T20:00:57Z

Seems like there is pageX, pageY but also clientX, clientY
https://michaelwornow.net/2024/01/02/display-x-y-coords-chrome-debugger

https://developer.mozilla.org/en-US/docs/Web/API/MouseEvent/clientX
https://developer.mozilla.org/en-US/docs/Web/API/MouseEvent/pageX

Only way to know how / which one of these to use is to write some tests :)

ryanhoangt · 2024-12-06T10:05:27Z

Seems like there is pageX, pageY but also clientX, clientY
https://michaelwornow.net/2024/01/02/display-x-y-coords-chrome-debugger

From the blog seems like clientX/clientY is relative to viewport, and pageX/pageY is relative to the whole webpage. I think clientX/clientY is closer to what we want 🤔

recursix · 2025-07-16T20:22:34Z

I would like to move forward with this, but cthe urrent code will not universally work.
Could we iterated on this, @ryanhoangt, are you still in terested to work on this.
This chat with claude, is inspiring. It seems like we would need to update all action functions in bgym such that it would update a global variable that would contain the appropriate info.

Might not the best solution, but we could itereate on this.

gasse requested changes Dec 3, 2024

View reviewed changes

ryanhoangt added 3 commits December 3, 2024 13:49

add mouse position to env obs

7ada7ee

move init to reset

ad79b71

fix tests

a701498

gasse force-pushed the add-mouse-position branch from cd33d61 to a701498 Compare December 3, 2024 18:49

use tuple type

e9694a1

track last mousemove with iframe info

4202aab

amanjaiswal73892 linked an issue Jul 16, 2025 that may be closed by this pull request

mouse coord as observation #348

Open

amanjaiswal73892 requested a review from recursix July 16, 2025 16:52

amanjaiswal73892 added the enhancement New feature or request label Jul 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mouse position into env observation#282

Add mouse position into env observation#282
ryanhoangt wants to merge 5 commits intoServiceNow:mainfrom
ryanhoangt:add-mouse-position

ryanhoangt commented Nov 28, 2024

Uh oh!

gasse left a comment

Uh oh!

Uh oh!

gasse Dec 3, 2024

Uh oh!

gasse Dec 3, 2024

Uh oh!

gasse Dec 3, 2024

Uh oh!

gasse commented Dec 3, 2024

Uh oh!

gasse commented Dec 3, 2024 •

edited

Loading

Uh oh!

ryanhoangt commented Dec 6, 2024

Uh oh!

recursix commented Jul 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ryanhoangt commented Nov 28, 2024

Uh oh!

gasse left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gasse Dec 3, 2024

Choose a reason for hiding this comment

Uh oh!

gasse Dec 3, 2024

Choose a reason for hiding this comment

Uh oh!

gasse Dec 3, 2024

Choose a reason for hiding this comment

Uh oh!

gasse commented Dec 3, 2024

Uh oh!

gasse commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryanhoangt commented Dec 6, 2024

Uh oh!

recursix commented Jul 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gasse commented Dec 3, 2024 •

edited

Loading