Add mouse position into env observation#282
Add mouse position into env observation#282ryanhoangt wants to merge 5 commits intoServiceNow:mainfrom
Conversation
gasse
left a comment
There was a problem hiding this comment.
Nice feature! See my comments to make it robust to iFrames
| window.addEventListener("load", () => {window.browsergym_page_activated();}, {capture: true}); | ||
| window.addEventListener("pageshow", () => {window.browsergym_page_activated();}, {capture: true}); | ||
| window.addEventListener("mousemove", () => {window.browsergym_page_activated();}, {capture: true}); | ||
| window.addEventListener("mousemove", (event) => {window.browsergym_page_activated(); window.pageX = event.clientX; window.pageY = event.clientY;}, {capture: true}); |
There was a problem hiding this comment.
Clean and simple, I like this
| Returns: | ||
| An array of the x and y coordinates of the mouse location. | ||
| """ | ||
| position = page.evaluate("""() => { |
There was a problem hiding this comment.
This will work for simple pages, but I'm worried about iframes. Here is something that could work:
- in the JS callback (mousemove), record the position in JS in the
windowobject, and also record which page / frame received this event, in Python with a method similar to_activate_page_from_js(). - to extract the mouse position in the browser viewport, take the latest mouse position (last iframe that received a mousemove event), and work your way up the frame hierarchy to reconstruct the current mouse position. See how we do that to get the coordinates of all elements in all iframes here:
tests/core/test_actions_highlevel.py
Outdated
|
|
||
| obs, reward, term, trunc, info = env.step(action) | ||
| checkbox = get_checkbox_elem(obs) | ||
| assert obs['mouse_position'] == [x, y] |
There was a problem hiding this comment.
That's a good test, can you do the same for other pages which have iFrames, and check you get the correct coordinates when clicking on elements inside the iframe? (clicking with coordinates, and with bid)
cd33d61 to
a701498
Compare
|
BTW, a cool way to try this feature is to run an openended agent on a whiteboard and ask it to draw simple forms, like we did for the demo video here |
|
Seems like there is pageX, pageY but also clientX, clientY https://developer.mozilla.org/en-US/docs/Web/API/MouseEvent/clientX Only way to know how / which one of these to use is to write some tests :) |
From the blog seems like clientX/clientY is relative to viewport, and pageX/pageY is relative to the whole webpage. I think clientX/clientY is closer to what we want 🤔 |
|
I would like to move forward with this, but cthe urrent code will not universally work. Might not the best solution, but we could itereate on this. |
Hi, thanks for the project! I'm trying to implement and experiment with coordinate-based actions from
browsergymand it would be useful if the environment exposes this info via the observation. Not sure what the team thinks about this?One quirk is seems like there're no direct ways to get the mouse position from Playwright so I use a kinda hacky way to get that info.