Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble understanding visual automation element identifiers - check this #143

Closed
crea7564 opened this issue May 9, 2020 · 1 comment
Closed
Labels

Comments

@crea7564
Copy link

crea7564 commented May 9, 2020

Hi,

I'm new to rpa-python, thanks for the great package!

I was interested to try and get visual automation to work.

Based on the documentation, here's what I was under the impression could/should work:

  1. I capture an image of a button and I save this image somwhere, "/path/to/button.png"
  2. when I call r.click("/path/to/button.png") then it would search the screen for something visually similar using computer vision and click it

What I did actually get to work (occasionally) that it clicked the cell in my jupyter notebook where I had written that code 😄

So I'm a bit confused. I tried screenshots, I tried 1-1 the same image file as the one I wanted it to recognise, basic shapes like red square etc. but I never got it to click an image. So I'm wondering if my interpretation of the docs is off and I'm simply barking up the wrong tree?

The docs further say

If the image file specified does not exist, OCR will be used..

I can see how this is handy in making RPA process more robust.

However, is there really no feedback on whether the OCR fallback mechanism or visual automation was used? No matter what I put in this r.click() API might as well do OCR only. Was it doing anything at all with my image and how I would go about debugging this? By the way, this also makes it impossible to work in a Jupyter notebook since it's clicking my notebook code instead of the visual button I want to hit which is quite funny but a bit frustrating 😃

I'm on OSX Catalina, other functionality was working.

@kensoh
Copy link
Member

kensoh commented May 14, 2020

Hi @crea7564 you're welcome hope you enjoy using it after getting through this.

For whether OCR or image is used, if the image file you specified does not exist, then OCR will be the method. If there is really such an image file on your computer, then it will always rely on image matching. So the usage is unambiguous, depending on whether user provides a real image path and filename or some fake name to mean using OCR detection.

Not sure if there is some special monitor or screen set up you have. Try the following to see if you can isolate the root cause -

  1. use only laptop screen and detach any external monitors
  2. adjust your jupyter notebook window to bottom of screen
  3. capture some image snapshot using macOS cmd + shift + 4
  4. make sure that UI element is visible outside of jupyter window and not blocked by it
  5. try running r.click("/path/to/button.png") in notebook to see the result live

Main requirement is what you want to click must already be visible and remain visible on the screen. It can be a desktop app on macOS or something on the web-browser web page.

Also, try doing r.click(1,1) to see if mouse goes to top left corner. Try doing for bottom right corner for your display resolution (x,y) to confirm if is it because some issue with resolution-mapping.

@kensoh kensoh added the query label May 14, 2020
@kensoh kensoh changed the title Trouble understanding visual automation element identifiers Trouble understanding visual automation element identifiers - check this May 14, 2020
@kensoh kensoh closed this as completed May 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants