from the search results page +5. Confirm that you are on the Nothing phone 2 (128Gb) product page of the online store <name>. +6. Extract the price and availability of the Nothing Phone 2 (128GB) from the current product page. +7. Return to google search results page by navigating to the url https://www.google.com/search?q=Buy+Nothing+Phone+2+(128GB). +8. Confirm that you are on the google search results page for "Buy Nothing Phone 2 (128GB)". +9. Click on the second link titled <title> from the search results page +10. Continue untill you have extracted the availability, and price of Nothing Phone 2 (128GB) from all the online stores listed on the page. +"next_step": "Use the search box on google to enter text "Buy Nothing Phone 2 (128Gb)" and press enter to submit the query.", +"terminate":"no"} + +After the task is completed and when terminating: +Your reply: {"terminate":"yes", "final_response": "Here is the Nothing phone 2 price list: <price list>. The cheapest store is <store name> with price <price>."} + +Example 2: +Task: Find the cheapest premium economy flights from Helsinki to Stockholm on 15 March. Current page: www.skyscanner.com +{"plan":"1. List the interaction options available on skyscanner page relevant for flight reservation along with their default values. +2. Select the journey option to one-way (if not default). +3. Set number of passengers to 1 (if not default). +4. Set the departure date to 15 March 2025 (since 15 March 2024 is already past). +5. Set ticket type to Economy Premium. +5. Set from airport to ""Helsinki". +6. Set destination airport to Stockhokm +7. Confirm that current values in the source airport, destination airport and departure date fields are Helsinki, Stockholm and 15 August 2024 respectively. +8. Click on the search button to get the search results. +9. Confirm that you are on the search results page. +10. Extract the price of the cheapest flight from Helsinki to Stokchol from the search results.", +"next_step": "List all interaction options available on this skyscanner page relevant for flight reservation. This could be source airport, destination aiport etc. Also provide the current default values of the fields.", +"terminate":"no"}, +Notice above how there is confirmation after each step and how interaction (e.g. setting source and destination) with each element is a seperate step. Follow same pattern. + +Remember: you are a very very persistent planner who will try every possible strategy to accomplish the task perfectly. +Revise search query if needed, ask for more information if needed, and always verify the results before terminating the task. +Some basic information about the user: $basic_user_information""", + + "BROWSER_AGENT_PROMPT": """You will perform web navigation tasks, which may include logging into websites and interacting with any web content using the functions made available to you. + Use the provided DOM representation for element location or text summarization. + Interact with pages using only the "mmid" attribute in DOM elements. + You must extract mmid value from the fetched DOM, do not conjure it up. + Execute function sequentially to avoid navigation timing issues. Once a task is completed, confirm completion with ##TERMINATE TASK##. + The given actions are NOT parallelizable. They are intended for sequential execution. + If you need to call multiple functions in a task step, call one function at a time. Wait for the function's response before invoking the next function. This is important to avoid collision. + Strictly for search fields, submit the field by pressing Enter key. For other forms, click on the submit button. + Unless otherwise specified, the task must be performed on the current page. Use openurl only when explicitly instructed to navigate to a new page with a url specified. If you do not know the URL ask for it. + You will NOT provide any URLs of links on webpage. If user asks for URLs, you will instead provide the text of the hyperlink on the page and offer to click on it. This is very very important. + When inputing information, remember to follow the format of the input field. For example, if the input field is a date field, you will enter the date in the correct format (e.g. YYYY-MM-DD), you may get clues from the placeholder text in the input field. + if the task is ambigous or there are multiple options to choose from, you will ask the user for clarification. You will not make any assumptions. + Individual function will reply with action success and if any changes were observed as a consequence. Adjust your approach based on this feedback. + Once the task is completed or cannot be completed, return a short summary of the actions you performed to accomplish the task, and what worked and what did not. This should be followed by ##TERMINATE TASK##. Your reply will not contain any other information. + Additionally, If task requires an answer, you will also provide a short and precise answer followed by ##TERMINATE TASK##. + Ensure that user questions are answered from the DOM and not from memory or assumptions. To answer a question about textual information on the page, prefer to use text_only DOM type. To answer a question about interactive elements, use all_fields DOM type. + Do not provide any mmid values in your response. + Important: If you encounter an issues or is unsure how to proceed, simply ##TERMINATE TASK## and provide a detailed summary of the exact issue encountered. + Do not repeat the same action multiple times if it fails. Instead, if something did not work after a few attempts, terminate the task.""", + + + "VERFICATION_AGENT": """Given a conversation and a task, your task is to analyse the conversation and tell if the task is completed. If not, you need to tell what is not completed and suggest next steps to complete the task.""", + "ENTER_TEXT_AND_CLICK_PROMPT": """This skill enters text into a specified element and clicks another element, both identified by their DOM selector queries. + Ideal for seamless actions like submitting search queries, this integrated approach ensures superior performance over separate text entry and click commands. + Successfully completes when both actions are executed without errors, returning True; otherwise, it provides False or an explanatory message of any failure encountered. + Always prefer this dual-action skill for tasks that combine text input and element clicking to leverage its streamlined operation.""", + + + "OPEN_URL_PROMPT": """Opens a specified URL in the web browser instance. Returns url of the new page if successful or appropriate error message if the page could not be opened.""", + + + "GO_BACK_PROMPT": """Goes back to previous page in the browser history. Useful when correcting an incorrect action that led to a new page or when needing to revisit a previous page for information. Returns the full URL of the page after the back action is performed.""", + + + "COMMAND_EXECUTION_PROMPT": """Execute the user task "$command" $current_url_prompt_segment""", + + + "GET_USER_INPUT_PROMPT": """Get clarification by asking the user or wait for user to perform an action on webpage. This is useful e.g. when you encounter a login or captcha and requires the user to intervene. This skill will also be useful when task is ambigious and you need more clarification from the user (e.g. ["which source website to use to accomplish a task"], ["Enter your credentials on your webpage and type done to continue"]). Use this skill very sparingly and only when absolutely needed.""", + + + "GET_DOM_WITHOUT_CONTENT_TYPE_PROMPT": """Retrieves the DOM of the current web browser page. + Each DOM element will have an \"mmid\" attribute injected for ease of DOM interaction. + Returns a minified representation of the HTML DOM where each HTML DOM Element has an attribute called \"mmid\" for ease of DOM query selection. When \"mmid\" attribute is available, use it for DOM query selectors.""", + + + # This one below had all three content types including input_fields + "GET_DOM_WITH_CONTENT_TYPE_PROMPT": """Retrieves the DOM of the current web site based on the given content type. + The DOM representation returned contains items ordered in the same way they appear on the page. Keep this in mind when executing user requests that contain ordinals or numbered items. + text_only - returns plain text representing all the text in the web site. Use this for any information retrieval task. This will contain the most complete textual information. + input_fields - returns a JSON string containing a list of objects representing text input html elements with mmid attribute. Use this strictly for interaction purposes with text input fields. + all_fields - returns a JSON string containing a list of objects representing all interactive elements and their attributes with mmid attribute. Use this strictly to identify and interact with any type of elements on page. + If information is not available in one content type, you must try another content_type.""", + + + "GET_ACCESSIBILITY_TREE": """Retrieves the accessibility tree of the current web site. + The DOM representation returned contains items ordered in the same way they appear on the page. Keep this in mind when executing user requests that contain ordinals or numbered items.""", + + + "CLICK_PROMPT": """Executes a click action on the element matching the given mmid attribute value. It is best to use mmid attribute as the selector. + Returns Success if click was successful or appropriate error message if the element could not be clicked.""", + + + "CLICK_PROMPT_ACCESSIBILITY": """Executes a click action on the element a name and role. + Returns Success if click was successful or appropriate error message if the element could not be clicked.""", + + + "GET_URL_PROMPT": """Get the full URL of the current web page/site. If the user command seems to imply an action that would be suitable for an already open website in their browser, use this to fetch current website URL.""", + + + "ENTER_TEXT_PROMPT": """Single enter given text in the DOM element matching the given mmid attribute value. This will only enter the text and not press enter or anything else. + Returns Success if text entry was successful or appropriate error message if text could not be entered.""", + + + "CLICK_BY_TEXT_PROMPT": """Executes a click action on the element matching the text. If multiple text matches are found, it will click on all of them. Use this as last resort when all else fails.""", + + "BULK_ENTER_TEXT_PROMPT": """Bulk enter text in multiple DOM fields. To be used when there are multiple fields to be filled on the same page. + Enters text in the DOM elements matching the given mmid attribute value. + The input will receive a list of objects containing the DOM query selector and the text to enter. + This will only enter the text and not press enter or anything else. + Returns each selector and the result for attempting to enter text.""", + + + "PRESS_KEY_COMBINATION_PROMPT": """Presses the given key on the current web page. + This is useful for pressing the enter button to submit a search query, PageDown to scroll, ArrowDown to change selection in a focussed list etc.""", + + + "ADD_TO_MEMORY_PROMPT": """"Save any information that you may need later in this term memory. This could be useful for saving things to do, saving information for personalisation, or even saving information you may need in future for efficiency purposes E.g. Remember to call John at 5pm, This user likes Tesla company and considered buying shares, The user enrollment form is available in <url> etc.""", + + "HOVER_PROMPT": """Hover on a element with the given mmid attribute value. Hovering on an element can reveal additional information such as a tooltip or trigger a dropdown menu with different navigation options.""", + "GET_MEMORY_PROMPT": """Retrieve all the information previously stored in the memory""", + + + "PRESS_ENTER_KEY_PROMPT": """Presses the enter key in the given html field. This is most useful on text input fields.""", + + + "EXTRACT_TEXT_FROM_PDF_PROMPT": """Extracts text from a PDF file hosted at the given URL.""", + + + "BROWSER_AGENT_NO_SKILLS_PROMPT": """You are an autonomous agent tasked with performing web navigation on a Playwright instance, including logging into websites and executing other web-based actions. + You will receive user commands, formulate a plan and then write the PYTHON code that is needed for the task to be completed. + It is possible that the code you are writing is for one step at a time in the plan. This will ensure proper execution of the task. + Your operations must be precise and efficient, adhering to the guidelines provided below: + 1. **Asynchronous Code Execution**: Your tasks will often be asynchronous in nature, requiring careful handling. Wrap asynchronous operations within an appropriate async structure to ensure smooth execution. + 2. **Sequential Task Execution**: To avoid issues related to navigation timing, execute your actions in a sequential order. This method ensures that each step is completed before the next one begins, maintaining the integrity of your workflow. Some steps like navigating to a site will require a small amount of wait time after them to ensure they load correctly. + 3. **Error Handling and Debugging**: Implement error handling to manage exceptions gracefully. Should an error occur or if the task doesn't complete as expected, review your code, adjust as necessary, and retry. Use the console or logging for debugging purposes to track the progress and issues. + 4. **Using HTML DOM**: Do not assume what a DOM selector (web elements) might be. Rather, fetch the DOM to look for the selectors or fetch DOM inner text to answer a questions. This is crucial for accurate task execution. When you fetch the DOM, reason about its content to determine appropriate selectors or text that should be extracted. To fetch the DOM using playwright you can: + - Fetch entire DOM using page.content() method. In the fetched DOM, consider if appropriate to remove entire sections of the DOM like `script`, `link` elements + - Fetch DOM inner text only text_content = await page.evaluate("() => document.body.innerText || document.documentElement.innerText"). This is useful for information retrieval. + 5. **DOM Handling**: Never ever substring the extracted HTML DOM. You can remove entire sections/elements of the DOM like `script`, `link` elements if they are not needed for the task. This is crucial for accurate task execution. + 6. **Execution Verification**: After executing the user the given code, ensure that you verify the completion of the task. If the task is not completed, revise your plan then rewrite the code for that step. + 7. **Termination Protocol**: Once a task is verified as complete or if it's determined that further attempts are unlikely to succeed, conclude the operation and respond with `##TERMINATE##`, to indicate the end of the session. This signal should only be used when the task is fully completed or if there's a consensus that continuation is futile. + 8. **Code Modification and Retry Strategy**: If your initial code doesn't achieve the desired outcome, revise your approach based on the insights gained during the process. When DOM selectors you are using fail, fetch the DOM and reason about it to discover the right selectors.If there are timeouts, adjust increase times. Add other error handling mechanisms before retrying as needed. + 9. **Code Generation**: Generated code does not need documentation or usage examples. Assume that it is being executed by an autonomous agent acting on behalf of the user. Do not add placeholders in the code. + 10. **Browser Handling**: Do not user headless mode with playwright. Do not close the browser after every step or even after task completion. Leave it open. + 11. **Reponse**: Remember that you are communicating with an autonomous agent that does not reason. All it does is execute code. Only respond with code that it can execute unless you are terminating. + 12. **Playwrite Oddities**: There are certain things that Playwright does not do well: + - page.wait_for_selector: When providing a timeout value, it will almost always timeout. Put that call in a try/except block and catch the timeout. If timeout occurs just move to the next statement in the code and most likely it will work. For example, if next statement is page.fill, just execute it. + + + By following these guidelines, you will enhance the efficiency, reliability, and user interaction of your web navigation tasks. + Always aim for clear, concise, and well-structured code that aligns with best practices in asynchronous programming and web automation. + """, } diff --git a/ae/core/skills/__init__.py b/ae/core/skills/__init__.py index 9931cf7..4ecf9f1 100644 --- a/ae/core/skills/__init__.py +++ b/ae/core/skills/__init__.py @@ -15,6 +15,4 @@ from ae.core.skills.get_user_input import get_user_input from ae.core.skills.open_url import openurl -from ae.core.skills.press_key_combination import do_press_key_combination -from ae.core.skills.press_key_combination import press_enter_key from ae.core.skills.press_key_combination import press_key_combination \ No newline at end of file diff --git a/ae/core/skills/click_using_selector.py b/ae/core/skills/click_using_selector.py index 887f3ac..dfbf4e5 100644 --- a/ae/core/skills/click_using_selector.py +++ b/ae/core/skills/click_using_selector.py @@ -8,12 +8,13 @@ from ae.core.playwright_manager import PlaywrightManager from ae.utils.dom_helper import get_element_outer_html -from ae.utils.dom_mutation_observer import subscribe -from ae.utils.dom_mutation_observer import unsubscribe +from ae.utils.dom_mutation_observer import subscribe # type: ignore +from ae.utils.dom_mutation_observer import unsubscribe # type: ignore from ae.utils.logger import logger +from ae.utils.ui_messagetype import MessageType -async def click(selector: Annotated[str, "The properly formed query selector string to identify the element for the click action. When \"mmid\" attribute is present, use it for the query selector."], +async def click(selector: Annotated[str, "The properly formed query selector string to identify the element for the click action (e.g. [mmid='114']). When \"mmid\" attribute is present, use it for the query selector."], wait_before_execution: Annotated[float, "Optional wait time in seconds before executing the click event logic.", float] = 0.0) -> Annotated[str, "A message indicating success or failure of the click."]: """ Executes a click action on the element matching the given query selector string within the currently open web page. @@ -35,7 +36,7 @@ async def click(selector: Annotated[str, "The properly formed query selector str if page is None: # type: ignore raise ValueError('No active page found. OpenURL command opens a new page.') - function_name = inspect.currentframe().f_code.co_name + function_name = inspect.currentframe().f_code.co_name # type: ignore await browser_manager.take_screenshots(f"{function_name}_start", page) @@ -51,15 +52,10 @@ def detect_dom_changes(changes:str): # type: ignore await asyncio.sleep(0.1) # sleep for 100ms to allow the mutation observer to detect changes unsubscribe(detect_dom_changes) await browser_manager.take_screenshots(f"{function_name}_end", page) - await browser_manager.notify_user(result["summary_message"]) + await browser_manager.notify_user(result["summary_message"], message_type=MessageType.ACTION) if dom_changes_detected: - return f"{result['detailed_message']}.\n As a consequence of this action, new elements have appeared in view: {dom_changes_detected}. Get all_fields to interact with the elements." - return result["detailed_message"] - - - result = await do_click(page, selector, wait_before_execution) - await browser_manager.notify_user(result["summary_message"]) + return f"Success: {result['summary_message']}.\n As a consequence of this action, new elements have appeared in view: {dom_changes_detected}. This means that the action to click {selector} is not yet executed and needs further interaction. Get all_fields DOM to complete the interaction." return result["detailed_message"] @@ -109,7 +105,7 @@ async def do_click(page: Page, selector: str, wait_before_execution: float) -> d element_tag_name = await element.evaluate("element => element.tagName.toLowerCase()") element_outer_html = await get_element_outer_html(element, page, element_tag_name) - + if element_tag_name == "option": element_value = await element.get_attribute("value") # get the text that is in the value of the option @@ -118,16 +114,15 @@ async def do_click(page: Page, selector: str, wait_before_execution: float) -> d await parent_element.select_option(value=element_value) # type: ignore logger.info(f'Select menu option "{element_value}" selected') - - return {"summary_message": f'Select menu option "{element_value}" selected', + + return {"summary_message": f'Select menu option "{element_value}" selected', "detailed_message": f'Select menu option "{element_value}" selected. The select element\'s outer HTML is: {element_outer_html}.'} - - await element.focus() + + #Playwright click seems to fail more often than not, disabling it for now and just going with JS click #await perform_playwright_click(element, selector) - await perform_javascript_click(page, selector) - msg = f"Element with selector: \"{selector}\" clicked." - return {"summary_message": msg, "detailed_message": f"{msg} The clicked element's outer HTML is: {element_outer_html}."} + msg = await perform_javascript_click(page, selector) + return {"summary_message": msg, "detailed_message": f"{msg} The clicked element's outer HTML is: {element_outer_html}."} # type: ignore except Exception as e: logger.error(f"Unable to click element with selector: \"{selector}\". Error: {e}") traceback.print_exc() @@ -202,7 +197,12 @@ async def perform_javascript_click(page: Page, selector: str): if (element.tagName.toLowerCase() === "a") { element.target = "_self"; } + let ariaExpandedBeforeClick = element.getAttribute('aria-expanded'); element.click(); + let ariaExpandedAfterClick = element.getAttribute('aria-expanded'); + if (ariaExpandedBeforeClick === 'false' && ariaExpandedAfterClick === 'true') { + return "Executed JavaScript Click on element with selector: "+selector +". Very important: As a consequence a menu has appeared where you may need to make further selction. Very important: Get all_fields DOM to complete the action."; + } return "Executed JavaScript Click on element with selector: "+selector; } }""" diff --git a/ae/core/skills/enter_text_and_click.py b/ae/core/skills/enter_text_and_click.py index 9996416..dd2a926 100644 --- a/ae/core/skills/enter_text_and_click.py +++ b/ae/core/skills/enter_text_and_click.py @@ -7,6 +7,7 @@ from ae.core.skills.enter_text_using_selector import do_entertext from ae.core.skills.press_key_combination import do_press_key_combination from ae.utils.logger import logger +from ae.utils.ui_messagetype import MessageType async def enter_text_and_click( @@ -46,12 +47,12 @@ async def enter_text_and_click( await browser_manager.highlight_element(text_selector, True) - function_name = inspect.currentframe().f_code.co_name + function_name = inspect.currentframe().f_code.co_name # type: ignore await browser_manager.take_screenshots(f"{function_name}_start", page) text_entry_result = await do_entertext(page, text_selector, text_to_enter, use_keyboard_fill=True) - await browser_manager.notify_user(text_entry_result["summary_message"]) + #await browser_manager.notify_user(text_entry_result["summary_message"]) if not text_entry_result["summary_message"].startswith("Success"): await browser_manager.take_screenshots(f"{function_name}_end", page) return(f"Failed to enter text '{text_to_enter}' into element with selector '{text_selector}'. Check that the selctor is valid.") @@ -63,16 +64,16 @@ async def enter_text_and_click( do_press_key_combination_result = await do_press_key_combination(browser_manager, page, "Enter") if do_press_key_combination_result: result["detailed_message"] += f" Instead of click, pressed the Enter key successfully on element: \"{click_selector}\"." - await browser_manager.notify_user(f"Pressed the Enter key successfully on element: \"{click_selector}\".") + await browser_manager.notify_user(f"Pressed the Enter key successfully on element: \"{click_selector}\".", message_type=MessageType.ACTION) else: result["detailed_message"] += f" Clicking the same element after entering text in it, is of no value. Tried pressing the Enter key on element \"{click_selector}\" instead of click and failed." - await browser_manager.notify_user("Failed to press the Enter key on element \"{click_selector}\".") + await browser_manager.notify_user("Failed to press the Enter key on element \"{click_selector}\".", message_type=MessageType.ACTION) else: await browser_manager.highlight_element(click_selector, True) do_click_result = await do_click(page, click_selector, wait_before_click_execution) result["detailed_message"] += f' {do_click_result["detailed_message"]}' - await browser_manager.notify_user(do_click_result["summary_message"]) + #await browser_manager.notify_user(do_click_result["summary_message"]) await asyncio.sleep(0.1) # sleep for 100ms to allow the mutation observer to detect changes diff --git a/ae/core/skills/enter_text_using_selector.py b/ae/core/skills/enter_text_using_selector.py index fe369bb..0491092 100644 --- a/ae/core/skills/enter_text_using_selector.py +++ b/ae/core/skills/enter_text_using_selector.py @@ -13,6 +13,7 @@ from ae.utils.dom_mutation_observer import subscribe from ae.utils.dom_mutation_observer import unsubscribe from ae.utils.logger import logger +from ae.utils.ui_messagetype import MessageType @dataclass @@ -62,11 +63,12 @@ async def custom_fill_element(page: Page, selector: str, text_to_enter: str): selector = f"{selector}" # Ensures the selector is treated as a string await page.evaluate("""(inputParams) => { const selector = inputParams.selector; - const text_to_enter = inputParams.text_to_enter; + let text_to_enter = inputParams.text_to_enter; + text_to_enter = text_to_enter.trim(); document.querySelector(selector).value = text_to_enter; }""", {"selector": selector, "text_to_enter": text_to_enter}) -async def entertext(entry: Annotated[EnterTextEntry, "An object containing 'query_selector' (DOM selector query using mmid attribute) and 'text' (text to enter on the element)."]) -> Annotated[str, "Explanation of the outcome of this operation."]: +async def entertext(entry: Annotated[EnterTextEntry, "An object containing 'query_selector' (DOM selector query using mmid attribute e.g. [mmid='114']) and 'text' (text to enter on the element)."]) -> Annotated[str, "Explanation of the outcome of this operation."]: """ Enters text into a DOM element identified by a CSS selector. @@ -106,11 +108,12 @@ async def entertext(entry: Annotated[EnterTextEntry, "An object containing 'quer if page is None: # type: ignore return "Error: No active page found. OpenURL command opens a new page." - function_name = inspect.currentframe().f_code.co_name + function_name = inspect.currentframe().f_code.co_name # type: ignore await browser_manager.take_screenshots(f"{function_name}_start", page) await browser_manager.highlight_element(query_selector, True) + dom_changes_detected=None def detect_dom_changes(changes:str): # type: ignore nonlocal dom_changes_detected @@ -124,13 +127,13 @@ def detect_dom_changes(changes:str): # type: ignore await browser_manager.take_screenshots(f"{function_name}_end", page) - await browser_manager.notify_user(result["summary_message"]) + await browser_manager.notify_user(result["summary_message"], message_type=MessageType.ACTION) if dom_changes_detected: - return f"{result['detailed_message']}.\n As a consequence of this action, new elements have appeared in view: {dom_changes_detected}. Get all_fields to interact with the elements." + return f"{result['detailed_message']}.\n As a consequence of this action, new elements have appeared in view: {dom_changes_detected}. This means that the action of entering text {text_to_enter} is not yet executed and needs further interaction. Get all_fields DOM to complete the interaction." return result["detailed_message"] -async def do_entertext(page: Page, selector: str, text_to_enter: str, use_keyboard_fill: bool=False): +async def do_entertext(page: Page, selector: str, text_to_enter: str, use_keyboard_fill: bool=True): """ Performs the text entry operation on a DOM element. @@ -157,6 +160,7 @@ async def do_entertext(page: Page, selector: str, text_to_enter: str, use_keyboa - If 'use_keyboard_fill' is set to False, the function uses the 'custom_fill_element' method to enter the text. """ try: + logger.debug(f"Looking for selector {selector} to enter text: {text_to_enter}") elem = await page.query_selector(selector) @@ -170,19 +174,19 @@ async def do_entertext(page: Page, selector: str, text_to_enter: str, use_keyboa if use_keyboard_fill: await elem.focus() + await asyncio.sleep(0.1) await press_key_combination("Control+A") await asyncio.sleep(0.1) await press_key_combination("Backspace") + await asyncio.sleep(0.1) logger.debug(f"Focused element with selector {selector} to enter text") - await page.keyboard.type(text_to_enter, delay=2) + #add a 100ms delay + await page.keyboard.type(text_to_enter, delay=1) else: await custom_fill_element(page, selector, text_to_enter) - logger.info(f"Success. Text \"{text_to_enter}\" set successfully in the element with selector {selector}") await elem.focus() - await page.keyboard.type("") # some html pages can have placeholders that only disappear upon keyboard input - await asyncio.sleep(1) + logger.info(f"Success. Text \"{text_to_enter}\" set successfully in the element with selector {selector}") success_msg = f"Success. Text \"{text_to_enter}\" set successfully in the element with selector {selector}" - return {"summary_message": success_msg, "detailed_message": f"{success_msg} and outer HTML: {element_outer_html}."} except Exception as e: diff --git a/ae/core/skills/get_dom_with_content_type.py b/ae/core/skills/get_dom_with_content_type.py index c60c2ba..8d0de3e 100644 --- a/ae/core/skills/get_dom_with_content_type.py +++ b/ae/core/skills/get_dom_with_content_type.py @@ -10,6 +10,7 @@ from ae.utils.dom_helper import wait_for_non_loading_dom_state from ae.utils.get_detailed_accessibility_tree import do_get_accessibility_info from ae.utils.logger import logger +from ae.utils.ui_messagetype import MessageType async def get_dom_with_content_type( @@ -73,7 +74,7 @@ async def get_dom_with_content_type( elapsed_time = time.time() - start_time logger.info(f"Get DOM Command executed in {elapsed_time} seconds") - await browser_manager.notify_user(user_success_message) + await browser_manager.notify_user(user_success_message, message_type=MessageType.ACTION) return extracted_data # type: ignore @@ -81,7 +82,7 @@ async def get_filtered_text_content(page: Page) -> str: text_content = await page.evaluate(""" () => { // Array of query selectors to filter out - const selectorsToFilter = ['#agentDriveAutoOverlay']; + const selectorsToFilter = ['#agente-overlay']; // Store the original visibility values to revert later const originalStyles = []; @@ -101,6 +102,7 @@ async def get_filtered_text_content(page: Page) -> str: // Get all the alt text from images on the page let altTexts = Array.from(document.querySelectorAll('img')).map(img => img.alt); altTexts="Other Alt Texts in the page: " + altTexts.join(' '); + // Revert the visibility changes originalStyles.forEach(entry => { entry.element.style.visibility = entry.originalStyle; @@ -109,4 +111,5 @@ async def get_filtered_text_content(page: Page) -> str: return textContent; } """) - return text_content \ No newline at end of file + return text_content + diff --git a/ae/core/skills/get_url.py b/ae/core/skills/get_url.py index f26323a..343c3da 100644 --- a/ae/core/skills/get_url.py +++ b/ae/core/skills/get_url.py @@ -1,7 +1,6 @@ from typing import Annotated from ae.core.playwright_manager import PlaywrightManager -from ae.utils.logger import logger async def geturl() -> Annotated[str, "Returns the full URL of the current active web site/page."]: @@ -14,7 +13,7 @@ async def geturl() -> Annotated[str, "Returns the full URL of the current active - Full URL the browser's active page. """ - logger.info("Executing Get URL Command") + try: # Create and use the PlaywrightManager browser_manager = PlaywrightManager(browser_type='chromium', headless=False) @@ -23,10 +22,19 @@ async def geturl() -> Annotated[str, "Returns the full URL of the current active if not page: raise ValueError('No active page found. OpenURL command opens a new page.') + await page.wait_for_load_state("domcontentloaded") + # Get the URL of the current page - url = page.url - logger.debug("Returning URL: "+url) - await browser_manager.notify_user("Grabbed the URL of the current page.") - return url + try: + title = await page.title() + current_url = page.url + if len(current_url) >250: + current_url = current_url[:250] + "..." + return f"Current Page: {current_url}, Title: {title}" # type: ignore + except: # noqa: E722 + current_url = page.url + return f"Current Page: {current_url}" + except Exception as e: raise ValueError('No active page found. OpenURL command opens a new page.') from e + diff --git a/ae/core/skills/get_user_input.py b/ae/core/skills/get_user_input.py index 9fcfb49..df72ac2 100644 --- a/ae/core/skills/get_user_input.py +++ b/ae/core/skills/get_user_input.py @@ -15,6 +15,7 @@ async def get_user_input(questions: Annotated[List[str], "List of questions to a Returns: - Newline separated list of questions to ask the user """ + answers: dict[str, str] = {} browser_manager = PlaywrightManager(browser_type='chromium', headless=False) if browser_manager.ui_manager: diff --git a/ae/core/skills/open_url.py b/ae/core/skills/open_url.py index d3b15e6..1acd5e8 100644 --- a/ae/core/skills/open_url.py +++ b/ae/core/skills/open_url.py @@ -3,8 +3,8 @@ from ae.core.playwright_manager import PlaywrightManager from ae.utils.logger import logger +from ae.utils.ui_messagetype import MessageType -#Annotated[Page, "The page instance that navigated to the specified URL."] async def openurl(url: Annotated[str, "The URL to navigate to. Value must include the protocol (http:// or https://)."], timeout: Annotated[int, "Additional wait time in seconds after initial load."] = 3) -> Annotated[str, "Returns the result of this request in text form"]: @@ -20,12 +20,11 @@ async def openurl(url: Annotated[str, "The URL to navigate to. Value must includ - URL of the new page. """ logger.info(f"Opening URL: {url}") - browser_manager = PlaywrightManager(browser_type='chromium', headless=False) await browser_manager.get_browser_context() page = await browser_manager.get_current_page() # Navigate to the URL with a short timeout to ensure the initial load starts - function_name = inspect.currentframe().f_code.co_name + function_name = inspect.currentframe().f_code.co_name # type: ignore try: await browser_manager.take_screenshots(f"{function_name}_start", page) url = ensure_protocol(url) @@ -37,9 +36,11 @@ async def openurl(url: Annotated[str, "The URL to navigate to. Value must includ await browser_manager.take_screenshots(f"{function_name}_end", page) - await browser_manager.notify_user(f"Opened URL: {url}") - return f"Page loaded: {page.url.split('?')[0]}" # type: ignore - + await browser_manager.notify_user(f"Opened URL: {url}", message_type=MessageType.ACTION) + # Get the page title + title = await page.title() + url=page.url + return f"Page loaded: {url}, Title: {title}" # type: ignore def ensure_protocol(url: str) -> str: """ diff --git a/ae/core/skills/pdf_text_extractor.py b/ae/core/skills/pdf_text_extractor.py index f3734b3..be05081 100644 --- a/ae/core/skills/pdf_text_extractor.py +++ b/ae/core/skills/pdf_text_extractor.py @@ -7,6 +7,7 @@ from ae.config import PROJECT_TEMP_PATH from ae.core.playwright_manager import PlaywrightManager from ae.utils.logger import logger +from ae.utils.ui_messagetype import MessageType async def extract_text_from_pdf(pdf_url: Annotated[str, "The URL of the PDF file to extract text from."]) -> Annotated[str, "All the text found in the PDF file."]: @@ -35,7 +36,7 @@ async def extract_text_from_pdf(pdf_url: Annotated[str, "The URL of the PDF file text += page_text + "\n" extracted_text = text.strip() word_count = len(extracted_text.split()) - await browser_manager.notify_user(f"Extracted text from the PDF successfully. Found {word_count} words.") + await browser_manager.notify_user(f"Extracted text from the PDF successfully. Found {word_count} words.", message_type=MessageType.ACTION) return "Text found in the PDF:\n" + extracted_text except httpx.HTTPStatusError as e: logger.error(f"An error occurred while downloading the PDF from {pdf_url}: {str(e)}") diff --git a/ae/core/skills/press_key_combination.py b/ae/core/skills/press_key_combination.py index aec8f67..3660dab 100644 --- a/ae/core/skills/press_key_combination.py +++ b/ae/core/skills/press_key_combination.py @@ -1,15 +1,17 @@ +import asyncio import inspect -import time from typing import Annotated -from playwright.async_api import Page +from playwright.async_api import Page # type: ignore from ae.core.playwright_manager import PlaywrightManager -from ae.core.skills.click_using_selector import do_click +from ae.utils.dom_mutation_observer import subscribe # type: ignore +from ae.utils.dom_mutation_observer import unsubscribe # type: ignore from ae.utils.logger import logger +from ae.utils.ui_messagetype import MessageType -async def press_key_combination(key_combination: Annotated[str, "The key combination to press using '+' as a separator, e.g., 'Control+C', Enter."]) -> str: +async def press_key_combination(key_combination: Annotated[str, "The key to press, e.g., Enter, PageDown etc"]) -> str: """ Presses a key combination on the current active page managed by PlaywrightManager. @@ -28,7 +30,6 @@ async def press_key_combination(key_combination: Annotated[str, "The key combina """ logger.info(f"Executing press_key_combination with key combo: {key_combination}") - start_time = time.time() # Create and use the PlaywrightManager browser_manager = PlaywrightManager() page = await browser_manager.get_current_page() @@ -39,6 +40,12 @@ async def press_key_combination(key_combination: Annotated[str, "The key combina # Split the key combination if it's a combination of keys keys = key_combination.split('+') + dom_changes_detected=None + def detect_dom_changes(changes:str): # type: ignore + nonlocal dom_changes_detected + dom_changes_detected = changes # type: ignore + + subscribe(detect_dom_changes) # If it's a combination, hold down the modifier keys for key in keys[:-1]: # All keys except the last one are considered modifier keys await page.keyboard.down(key) @@ -49,26 +56,14 @@ async def press_key_combination(key_combination: Annotated[str, "The key combina # Release the modifier keys for key in keys[:-1]: await page.keyboard.up(key) + await asyncio.sleep(0.1) # sleep for 100ms to allow the mutation observer to detect changes + unsubscribe(detect_dom_changes) - print(f"Operation completed in {time.time() - start_time} seconds.") - return f"Key combination {key_combination} executed successfully" - -async def press_enter_key(selector: Annotated[str, """The properly formed query selector string to identify the element to press enter key in. - When \"mmid\" attribute is present, use it for the query selector."""]) -> Annotated[str, "A message indicating success or failure."]: - logger.info(f"Executing press_enter_key with selector: \"{selector}\"") - browser_manager = PlaywrightManager(browser_type='chromium', headless=False) - page = await browser_manager.get_current_page() - - if page is None: # type: ignore - raise ValueError('No active page found. OpenURL command opens a new page.') + if dom_changes_detected: + return f"Key {key_combination} executed successfully.\n As a consequence of this action, new elements have appeared in view:{dom_changes_detected}. This means that the action is not yet executed and needs further interaction. Get all_fields DOM to complete the interaction." - await do_click(page, selector, wait_before_execution=0.0) - result = await do_press_key_combination(browser_manager, page, 'Enter') - - if result: - return f"Enter key pressed in field with selector: {selector}" - else: - return f"Failed to press Enter key in field with selector: {selector}" + await browser_manager.notify_user(f"Key {key_combination} executed successfully", message_type=MessageType.ACTION) + return f"Key {key_combination} executed successfully" async def do_press_key_combination(browser_manager: PlaywrightManager, page: Page, key_combination: str) -> bool: @@ -90,7 +85,7 @@ async def do_press_key_combination(browser_manager: PlaywrightManager, page: Pag logger.info(f"Executing press_key_combination with key combo: {key_combination}") try: - function_name = inspect.currentframe().f_code.co_name + function_name = inspect.currentframe().f_code.co_name # type: ignore await browser_manager.take_screenshots(f"{function_name}_start", page) # Split the key combination if it's a combination of keys keys = key_combination.split('+') @@ -113,3 +108,4 @@ async def do_press_key_combination(browser_manager: PlaywrightManager, page: Pag await browser_manager.take_screenshots(f"{function_name}_end", page) return True + diff --git a/ae/core/system_orchestrator.py b/ae/core/system_orchestrator.py index de98f27..31c4c73 100644 --- a/ae/core/system_orchestrator.py +++ b/ae/core/system_orchestrator.py @@ -7,6 +7,7 @@ from ae.config import SOURCE_LOG_FOLDER_PATH from ae.core.autogen_wrapper import AutogenWrapper from ae.utils.cli_helper import async_input # type: ignore +from ae.utils.http_helper import make_post_request from ae.utils.logger import logger @@ -24,7 +25,7 @@ class SystemOrchestrator: shutdown_event (asyncio.Event): Event to wait for an exit command to be processed. """ - def __init__(self, agent_scenario:str="user_proxy,browser_nav_agent", input_mode:str="GUI_ONLY"): + def __init__(self, agent_scenario:str="user,planner_agent,browser_nav_agent,browser_nav_executor", input_mode:str="GUI_ONLY"): """ Initializes the system orchestrator with the specified agent scenario and input mode. @@ -37,16 +38,34 @@ def __init__(self, agent_scenario:str="user_proxy,browser_nav_agent", input_mode self.browser_manager = None self.autogen_wrapper = None self.is_running = False + + if os.getenv('ORCHESTRATOR_API_KEY', None) is not None and os.getenv('ORCHESTRATOR_GATEWAY', None) is not None: + self.__populate_orchestrator_info() + logger.info(f"Orchestrator endpoint: {self.orchestrator_endpoint}") + else: + self.use_orchestrator = False + self.__parse_user_and_browser_agent_names() self.shutdown_event = asyncio.Event() #waits for an exit command to be processed + + def __populate_orchestrator_info(self): + """ + Populates the orchestrator information by retrieving the API key, gateway, and endpoint from environment variables. + """ + self.orchestrator_api_key = os.getenv('ORCHESTRATOR_API_KEY') + self.orchestrator_gateway = os.getenv('ORCHESTRATOR_GATEWAY') + self.orchestrator_endpoint = f"{self.orchestrator_gateway}/api/orchestrate" + self.use_orchestrator = True + + def __parse_user_and_browser_agent_names(self): """ Parse the user and browser agent names from agent_scenario """ self.agent_names = self.agent_scenario.split(',') for agent_name in self.agent_names: - if 'user_proxy' in agent_name: + if 'user' in agent_name: self.ser_agent_name = agent_name else: self.browser_agent_name = agent_name @@ -92,6 +111,25 @@ async def receive_command(self, command: str): """ await self.process_command(command) + async def __orchestrate_command(self, command: str): + if not self.use_orchestrator: + return command + + orch_response = make_post_request(self.orchestrator_endpoint, {"query": command}, self.orchestrator_api_key, api_key_header_name="X-API-Key") # type: ignore + + if not orch_response: + return command + + if "user_notification" in orch_response: + await self.browser_manager.notify_user(orch_response["user_notification"]) # type: ignore + if "is_terminating" in orch_response and orch_response["is_terminating"]: + logger.info("Orchestrator indicated command execution completed.") + return None + if "reformulated_query" in orch_response: + logger.info(f"Orchestrator reformulated command to: {orch_response['reformulated_query']}") + return orch_response["reformulated_query"] + + async def process_command(self, command: str): """ Processes a given command, coordinating with the Autogen wrapper for execution and handling special commands like 'exit'. @@ -99,6 +137,7 @@ async def process_command(self, command: str): Args: command (str): The command to process. """ + logger.info(f"Received command: {command}") if command.lower() == 'exit': await self.shutdown() return @@ -107,15 +146,30 @@ async def process_command(self, command: str): self.is_running = True start_time = time.time() current_url = await self.browser_manager.get_current_url() if self.browser_manager else None + self.browser_manager.ui_manager.clear_conversation_history() # type: ignore self.browser_manager.log_user_message(command) # type: ignore - + result = None + logger.info(f"Processing command: {command}") if self.autogen_wrapper: - await self.autogen_wrapper.process_command(command, current_url) + await self.browser_manager.update_processing_state("processing") # type: ignore + orchestrated_command = await self.__orchestrate_command(command) + if orchestrated_command is not None: + result = await self.autogen_wrapper.process_command(orchestrated_command, current_url) + else: + result = await self.autogen_wrapper.process_command(command, current_url) + + await self.browser_manager.update_processing_state("done") # type: ignore end_time = time.time() elapsed_time = round(end_time - start_time, 2) logger.info(f"Command \"{command}\" took: {elapsed_time} seconds.") await self.save_chat_messages() - await self.browser_manager.notify_user(f"Completed ({elapsed_time}s).") # type: ignore + if result is not None: + chat_history= result.chat_history # type: ignore + last_message = chat_history[-1] if chat_history else None # type: ignore + if last_message and "terminate" in last_message and last_message["terminate"]=="yes": + await self.browser_manager.notify_user(last_message, "answer") # type: ignore + + await self.browser_manager.notify_user(f"Task Completed ({elapsed_time}s).", "info") # type: ignore await self.browser_manager.command_completed(command, elapsed_time) # type: ignore self.is_running = False diff --git a/ae/core/ui_manager.py b/ae/core/ui_manager.py index f3da06a..b86ee9c 100644 --- a/ae/core/ui_manager.py +++ b/ae/core/ui_manager.py @@ -1,5 +1,4 @@ -import json import os import traceback @@ -9,6 +8,7 @@ from ae.config import PROJECT_SOURCE_ROOT from ae.utils.js_helper import escape_js_message from ae.utils.logger import logger +from ae.utils.ui_messagetype import MessageType class UIManager: @@ -24,6 +24,10 @@ class UIManager: """ overlay_is_collapsed: bool = True + + overlay_processing_state: str = "init" #init: initialised, processing: processing is ongoing, done: processing is done + overlay_show_details:bool = True + conversation_history:list[dict[str, str]] = [] __update_overlay_chat_history_running: bool = False @@ -51,10 +55,12 @@ async def handle_navigation(self, frame: Frame): # Inject the JavaScript code into the page await frame.evaluate(js_code) + js_bool = str(self.overlay_show_details).lower() if self.overlay_is_collapsed: - await frame.evaluate("showCollapsedOverlay();") + await frame.evaluate(f"showCollapsedOverlay('{self.overlay_processing_state}', {js_bool});") else: - await frame.evaluate("showExpandedOverlay();") + await frame.evaluate(f"showExpandedOverlay('{self.overlay_processing_state}', {js_bool});") + #update chat history in the overlay await self.update_overlay_chat_history(frame) @@ -87,6 +93,32 @@ def update_overlay_state(self, is_collapsed: bool): self.overlay_is_collapsed = is_collapsed + + async def update_overlay_show_details(self, show_details: bool, page: Page): + """ + Updates the state of the overlay to either show steps or not. + + Args: + show_steps (bool): True to show steps, False to hide them. + """ + self.overlay_show_details = show_details + await self.update_overlay_chat_history(page) + + + async def update_processing_state(self, state: str, page: Page): + """ + Updates the processing state of the overlay. + + Args: + state (str): The processing state to update. + """ + self.overlay_processing_state = state + try: + js_bool = str(self.overlay_is_collapsed).lower() + await page.evaluate(f"updateOverlayState('{self.overlay_processing_state}', {js_bool});") + except Exception as e: + logger.debug(f"JavaScript error: {e}") + async def update_overlay_chat_history(self, frame_or_page: Frame | Page): """ Updates the chat history in the overlay. If the overlay is expanded and not currently being updated, @@ -110,16 +142,32 @@ async def update_overlay_chat_history(self, frame_or_page: Frame | Page): await frame_or_page.evaluate("clearOverlayMessages();") for message in self.conversation_history: safe_message = escape_js_message(message["message"]) + safe_message_type = escape_js_message(message.get("message_type", MessageType.STEP.value)) if message["from"] == "user": await frame_or_page.evaluate(f"addUserMessage({safe_message});") else: - await frame_or_page.evaluate(f"addSystemMessage({safe_message});") + #choose chich message types to be shown depending on UI setting + if self.overlay_show_details == False: # noqa: E712 + if message["message_type"] not in (MessageType.PLAN.value, MessageType.QUESTION.value, MessageType.ANSWER.value, MessageType.INFO.value): + continue + else: + if message["message_type"] not in (MessageType.PLAN.value, MessageType.QUESTION.value , MessageType.ANSWER.value, MessageType.INFO, MessageType.STEP.value): + continue + + js_code = f"addSystemMessage({safe_message}, is_awaiting_user_response=false, message_type={safe_message_type});" + await frame_or_page.evaluate(js_code) logger.debug("Chat history updated in overlay, removing update lock flag") except Exception: traceback.print_exc() finally: self.__update_overlay_chat_history_running = False + def clear_conversation_history(self): + """ + Clears the conversation history. + """ + self.conversation_history = [] + self.add_default_system_messages() def get_conversation_history(self): """ @@ -130,6 +178,7 @@ def get_conversation_history(self): """ return self.conversation_history + def new_user_message(self, message: str): """ Adds a new user message to the conversation history. @@ -137,25 +186,26 @@ def new_user_message(self, message: str): Args: message (str): The message text to add. """ + self.conversation_history.append({"from":"user", "message":message}) - def new_system_message(self, message: str): + def new_system_message(self, message: str, type:MessageType=MessageType.STEP): """ Adds a new system message to the conversation history. Args: message (str): The message text to add. """ - self.conversation_history.append({"from":"system", "message":message}) + self.conversation_history.append({"from":"system", "message":message, "message_type":type.value}) + print(f"Adding system message: {message}") def add_default_system_messages(self): """ Adds default system messages to the conversation history to greet the user or provide initial instructions. """ - self.new_system_message(json.dumps("Agent-E at your service, what can I do for you?")) - + pass async def command_completed(self, page: Page, command: str, elapsed_time: float|None = None): """ diff --git a/ae/main.py b/ae/main.py index b0ebd40..cd1a4a3 100644 --- a/ae/main.py +++ b/ae/main.py @@ -3,5 +3,5 @@ from ae.core.system_orchestrator import SystemOrchestrator if __name__ == "__main__": - orchestrator = SystemOrchestrator(agent_scenario="user_proxy,browser_nav_agent",input_mode="GUI_ONLY") + orchestrator = SystemOrchestrator(agent_scenario="user,planner_agent,browser_nav_agent,browser_nav_executor",input_mode="GUI_ONLY") asyncio.run(orchestrator.start()) diff --git a/ae/ui/injectOverlay.js b/ae/ui/injectOverlay.js index 59ebf94..f707de3 100644 --- a/ae/ui/injectOverlay.js +++ b/ae/ui/injectOverlay.js @@ -6,20 +6,8 @@ function injectOveralyStyles() { let style = document.createElement('style'); // Set the styles style.textContent = ` - @font-face { - font-family: 'CircularXX'; - src: url('https://assets.website-files.com/627028e6193b2d840a066eab/627028e6193b2d9dd2066edf_CircularXXWeb-Book.woff2') format('woff2'); - font-weight: 400; - font-style: normal; - font-display: auto; -} -@font-face { - font-family: 'CircularXXLight'; - src: url('https://assets.website-files.com/627028e6193b2d840a066eab/627028e6193b2d710b066eda_CircularXXWeb-Light.woff2') format('woff2'); - font-weight: 300; - font-style: normal; - font-display: auto; -} +@import url(https://fonts.googleapis.com/earlyaccess/notosanssc.css); + ::-webkit-scrollbar { width: 6px; border: solid 3px transparent; @@ -38,91 +26,146 @@ function injectOveralyStyles() { background-color: rgba(255, 255, 255, 0.6); } - .disabled { - opacity: 0.95; + .agente-pre-line { + white-space: pre-line; !important; } - .pre-line { - white-space: pre-line; + #agente-closebutton{ + width:30px; + height:30px; + min-width:30px; + min-height:30px; + margin-left: auto; + color:darkgray; + cursor: pointer; + background: transparent; + transition: transform 0.2s ease; + border: None; + } + #agente-closebutton:hover{ + transform: scale(1.1); } - .enabled { - opacity: 1; + #agente-closebutton:active{ + transform: scale(0.8); } - #closebutton{ - width:25px; - height:25px; - min-width:25px; - min-height:25px; - position: absolute; - top: 10px; - right: 10px; - color:darkgray; - cursor: pointer; - border: 1px solid lightgray; - z-index: 20000001; - background: white; + @keyframes agente-gradient-animation { + 0% {background-position: 100% 0%} + 100% {background-position: 15% 100%} + } + @keyframes agente-rotate { + 100% { + transform: rotate(1turn); + } } - #closebutton:hover{ - border: 1px solid orange; - color:black; - font-weight: bold; + + @keyframes automation_highlight_fadeout_animation { + 0% { border-color: rgba(128, 0, 128, 1); } + 50% { border-color: rgba(128, 0, 128, 1); } + 100% { border-color: rgba(128, 0, 128, 0); } + } + + .agente-ui-automation-highlight { + border-width: 2px !important; + border-style: solid !important; + animation: automation_highlight_fadeout_animation 5s linear 1 forwards !important; + } + + .agente-processing{ + background: linear-gradient(90deg, + rgba(255, 0, 255, 1) 0%, /* Bright Magenta */ + rgba(0, 191, 255, 1) 100% /* Deep Sky Blue */ + ); + background-size: 100% 200%; + animation: agente-rotate 1s linear infinite; + } + + .agente-init{ + background: darkgray; + box-shadow: rgba(120, 120, 120, 0.3) 0px 0px 20px + } + + .agente-done{ + background: lightgreen; + } + + .agente-processingLine { + background: linear-gradient(45deg, + rgba(255, 0, 0, 1) 0%, /* Red */ + rgba(255, 127, 0, 1) 25%, /* Orange */ + rgba(0, 255, 0, 1) 50%, /* Green */ + rgba(0, 0, 255, 1) 75%, /* Blue */ + rgba(255, 0, 0, 1) 90%, /* Red */ + rgba(255, 0, 0, 1) 100% /* Red */ + ); + background-size: 500% 100%; + animation: agente-gradient-animation 3s linear infinite; + } + + .agente-initStateLine{ + background: lightgray; + } + + .agente-doneStateLine{ + background: lightgreen; } - .collapsed{ + .agente-collapsed{ cursor: pointer; background-color: rgba(0, 0, 0, 0.1); background-repeat: no-repeat; background-position: center; background-size: cover; - width: 5vh; - height: 5vh; + width: 6vh; + height: 6vh; border-radius: 50%; right: 1.5vw; bottom: 1.5vw; - padding: 0.5%; box-shadow: rgba(0, 0, 0, 0.3) 0px 0px 20px } - .chat-container { + .agente-chat-container { margin:1%,1%,1%,1%; - width: 25vw; - height:60vh; + width: 30vw; + min-width: 350px; + height:70vh; bottom: 2vh; position: relative; + display: flex; + flex-direction: column; top: 6%; - box-sizing: border-box; /* Include padding in the width and height calculations */ + padding: 1%; + box-sizing: border-box; } - .icon{ - width: 25px; - border-radius: 50%; - height: 25px; - } - - - - .chat-input{ + .agente-chat-input{ display: flex; flex-direction: row; - gap:2%; - justify-content: center; align-items: center; - width: 100%; - margin-top:2vh; + width: 95%; + margin-top:1.5vh; + } + + .agente-agent{ + justify-content: flex-start; + } + + .agente-user{ + justify-content: flex-end; } - #user-input { + #agente-user-input { flex: 1; padding: 3px 3px; - border: 1px solid #ccc; - border-radius: 3px; - width:80%; + border: transparent; + width:100%; resize: none; - font-family: 'CircularXX'; - font-size: 14px; + font-family: 'Noto Sans SC'; + font-size: 1.6vh; + min-font-size: 12px; + line-height: 1.5; display: flex; vertical-align: middle; text-align: middle; @@ -131,205 +174,248 @@ function injectOveralyStyles() { border-color: #ccc; background: white; color:black; - line-height: 1.2; min-height: calc(1.2em * 2); scrollbar-width: thin; } - #user-input:focus { + #agente-user-input:focus { outline: none !important; - border:1px solid orange; - box-shadow: 0 0 10px #719ECE; - } - #send-btn { - padding: 5px; - margin-left: 5px; - border: 1px solid #ccc; - border-radius: 3px; - cursor: pointer; - color:black; - opacity: 0.9; - background: white; - height:100%; - font-family: 'CircularXX'; + border:0px solid transparent !important; + box-shadow: none !important; } - #send-btn:hover{ - background: orange; - opacity: 1; + #agente-send-btn { + cursor: pointer; + transition: transform 0.2s ease; + } + #agente-send-btn:hover{ + transform: scale(1.1); } - .highlight_overlay{ + .agente-highlight_overlay{ box-shadow: 1px 1px 1px 1px rgb(50 50 50 / 40%); - border-radius: 10px; - border: 1px solid #ccc; + border-radius: 16px; + border: 1px solid #E1DEE2; bottom:3px; right:5px; - padding: 1%; - padding-top:30px; - background: rgba(255, 255, 255, 1.0); + background: #FBFAFA; } - #chat-box { + + #agente-chat-box { overflow-y: auto; scrollbar-width: thin; height: 90%; - width:100%; display: flex; flex-direction: column; gap:1%; - margin:1%; + margin:1% 5%; padding-bottom:1%; margin-top:10%; } - #agentDriveAutoOverlay { + #agente-overlay { position: fixed; - min-width: 30px; - min-height: 30px; + min-width: 50px; + min-height: 50px; margin-left: auto; margin-right: auto; z-index:20000000; scrollbar-color: gray lightgray; margin-bottom: 1%; + display: flex; + flex-direction: column; } - .agent1{ - background: blueviolet; - border-radius: 50%; - } - - .agent2{ - background: rgba(150, 255, 150, 1); - border-radius: 50%; - } - .user{ - background: orange; - border-radius: 50%; - } - - .input-container { + .agente-input-container { display: flex; - padding: 0%; - height:8%; + flex-direction: column; + margin: 1% 3%; + padding: 1%; + height:20%; + background: white; + border: 1px solid #E1DEE2; + border-radius: 8px; } - .chat{ + .agente-chat{ width: 80%; color: black; overflow-wrap: break-word; - font-family: 'CircularXX'; - font-size: 14px; + font-family: 'Noto Sans SC'; + } - .agent1text{ + .agente-systemMessage{ text-align: left; justify-content: flex-start; - margin-right: auto; - margin-left: auto; - font-family: 'CircularXX'; - padding: 5%; + font-family: 'Noto Sans SC'; + padding: 2% 4%; + font-size: 1.5vh; + min-font-size: 12px; min-height: 30px; - background: linear-gradient(180deg, rgba(0, 0, 0, 0.04) 0%, rgba(0, 0, 0, 0.12) 100%); - box-shadow: 1px 1px 1px 1px rgb(150 150 150 / 60%); - padding-left: 10px; - border-radius: 20px; - border: 1px solid blueviolet; - width:72%; - } - .agent2text{ - text-align: left; - justify-content: flex-start; - margin-right: auto; - margin-left: auto; - font-family: 'CircularXX'; - padding: 5%; - min-height: 30px; - background: linear-gradient(180deg, rgba(0, 0, 0, 0.04) 0%, rgba(0, 0, 0, 0.12) 100%); - box-shadow: 1px 1px 1px 1px rgb(150 150 150 / 60%); - padding-left: 10px; - border-radius: 20px; - border: 1px solid rgba(150, 255, 150, 1); - width:72%; + background: #EEEEEF; + line-height: 1.7; + border-radius: 10px; + width:auto; + max-width: 90%; } - .usertext{ + .agente-usertext{ text-align: right; - justify-content: flex-start; - margin-right: auto; - margin-left: auto; - font-family: 'CircularXX'; - padding: 5%; + justify-content: flex-end; + align-items: flex-end; + font-family: 'Noto Sans SC'; + font-size: 1.5vh; + min-font-size: 12px; + padding: 2% 4%; + line-height: 1.7; min-height: 30px; - width:72%; - background: linear-gradient(180deg, rgba(0, 0, 0, 0.04) 0%, rgba(0, 0, 0, 0.20) 100%) - /* White Glass Effect */ - box-shadow: 8px 8px 16px rgba(0, 0, 0, 0.12), inset 1px 1px 2px rgba(255, 255, 255, 0.64), inset -1px -1px 2px rgba(255, 255, 255, 0.4); - border-radius: 20px; + width:auto; + background: #ECEBF3; + border-radius: 10px; color: black; - border: 1px solid orange; } - - @keyframes automation_blink { - 0% { border-color: rgba(128, 0, 128, 1); } - 50% { border-color: rgba(128, 0, 128, 1); } - 100% { border-color: rgba(128, 0, 128, 0); } + .agente-agentstep{ + color: #4B4B4B; } - - .ui_automation_pulsate { - border-width: 2px !important; - border-style: solid !important; - animation: automation_blink 5s linear 1 forwards !important; + .agente-agentplan{ + color: #4B4B4B; + } + .agente-agentanswer{ + color: black; } + + + .agente-toggle { + -webkit-appearance: none; + -moz-appearance: none; + appearance: none; + margin: 0; + display: inline-block; + position: relative; + border-radius: 50px; + overflow: hidden; + outline: none; + border: none; + cursor: pointer; + background-color: #E1DEE2; + transition: background-color ease 0.3s; + align-self: center; +} +.agente-toggle:focus { + border: none; !important; + outline: none; !important; +} +.agente-toggle:before { + content: ""; + display: block; + position: absolute; + z-index: 2; + width: 20px; + height: 20px; + background: #fff; + left: 2px; + top: 2px; + border-radius: 50%; + color: #fff; + text-shadow: -1px -1px rgba(0,0,0,0.15); + white-space: nowrap; + box-shadow: 0 1px 2px rgba(0,0,0,0.2); + transition: all cubic-bezier(0.3, 1.5, 0.7, 1) 0.3s; +} + +.agente-toggle:checked { + background-color: #786E96; +} + +.agente-toggle:checked:before { + left: 20px; +} `; // Append the style element to the head of the document document.head.appendChild(style); } let savedSelection = null; +let show_details = true; -function showCollapsedOverlay() { + +function showCollapsedOverlay(processing_state = "processing", steps) { + show_details = steps; removeOverlay(); window.overlay_state_changed(true); - let newDiv = document.createElement("div"); - newDiv.id = "agentDriveAutoOverlay"; - newDiv.classList.add("collapsed"); - newDiv.setAttribute("aria-hidden", "true"); - - let svg = `<svg xmlns="http://www.w3.org/2000/svg" height="800" width="800" viewBox="0 0 64 64" xml:space="preserve"><style>.st3{fill:#fff}.st4{fill:#4f5d73}</style><g id="Layer_1"><circle cx="32" cy="32" r="32" fill="#9c27b0"/><path d="M52 32c0-9.9-9-18-20-18s-20 8.1-20 18c0 9.6 8.3 17.4 18.8 17.9.7 3.7 1.2 6.1 1.2 6.1s5-3 9.6-8.2C47.8 44.7 52 38.8 52 32z" fill="#231f20" opacity=".2"/><path class="st3" d="M49 28.8C49 43.8 32 54 32 54s-9.4-42 0-42 17 7.5 17 16.8z" fill="#000000"/><ellipse class="st3" cx="32" cy="30" rx="20" ry="18" fill="#000000"/><circle class="st4" cx="32" cy="30" r="2" fill="#000000"/><circle class="st4" cx="40" cy="30" r="2" fill="#000000"/><circle class="st4" cx="24" cy="30" r="2" fill="#000000"/></g></svg>`; - let encodedSvg = encodeURIComponent(svg); + let collapsed_agente = document.createElement("div"); + collapsed_agente.id = "agente-overlay"; + collapsed_agente.classList.add("agente-collapsed"); + collapsed_agente.style.backgroundColor = "transparent"; + collapsed_agente.setAttribute("aria-hidden", "true"); + collapsed_agente.style.justifyContent = "center"; + let wrapper = document.createElement("div"); + wrapper.style.position = "relative"; + wrapper.style.width = "100%"; + wrapper.style.height = "100%"; + wrapper.style.justifyContent = "center"; + let logodiv= document.createElement("div"); + logodiv.style.width = "90%"; + logodiv.style.height = "90%"; + logodiv.style.left = "5%"; + logodiv.style.top = "5%"; + let borderdiv = document.createElement("div"); + borderdiv.style.width = "100%"; + borderdiv.style.height = "100%"; + borderdiv.style.borderRadius = "50%"; + + let logo = `<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><rect x="6.5" y="7.5" width="11" height="11" rx="0.5" stroke="#827C8C"/><rect x="-0.5" y="0.5" width="3" height="5" rx="0.5" transform="matrix(-1 0 0 1 6 10)" stroke="#827C8C"/><rect x="-0.5" y="0.5" width="3" height="5" rx="0.5" transform="matrix(-1 0 0 1 20 10)" stroke="#827C8C"/><path d="M12 4V7.5" stroke="#827C8C" stroke-linecap="round"/><rect x="8.5" y="11.5" width="7" height="3" rx="1.5" stroke="#827C8C"/></svg>`; + let encodedSvg = encodeURIComponent(logo); let svgUrl = 'data:image/svg+xml;utf8,' + encodedSvg; - - document.body.appendChild(newDiv); - let element = document.getElementById('agentDriveAutoOverlay'); - element.style.backgroundImage = `url("${svgUrl}")`; - document.getElementById('agentDriveAutoOverlay').addEventListener('mouseover', function () { + logodiv.style.backgroundImage = `url("${svgUrl}")`; + logodiv.style.backgroundRepeat = "no-repeat"; + logodiv.style.backgroundSize = "contain"; + logodiv.style.borderRadius = "50%"; + logodiv.style.backgroundPosition = "center"; + logodiv.style.backgroundColor = "white"; + logodiv.style.alignSelf = "center"; + borderdiv.style.position = "absolute"; + borderdiv.style.top = "0"; + borderdiv.style.left = "0"; + borderdiv.id="AgentEOverlayBorder"; + logodiv.style.position = "absolute"; + logodiv.style.justifySelf = "center"; + wrapper.appendChild(borderdiv); + wrapper.appendChild(logodiv); + collapsed_agente.appendChild(wrapper); + document.body.appendChild(collapsed_agente); + + updateOverlayState(processing_state, true); + + let element = document.getElementById('agente-overlay'); + document.getElementById('agente-overlay').addEventListener('mouseover', function () { this.style.transform = 'scale(1.1)'; }); - document.getElementById('agentDriveAutoOverlay').addEventListener('mouseout', function () { + document.getElementById('agente-overlay').addEventListener('mouseout', function () { this.style.transform = 'scale(1)'; }); - document.getElementById('agentDriveAutoOverlay').addEventListener('click', function () { - showExpandedOverlay(); + document.getElementById('agente-overlay').addEventListener('click', function () { + let ui_state = document.getElementById("AgentEOverlayBorder").classList.contains("agente-init") ? "init" : document.getElementById("AgentEOverlayBorder").classList.contains("agente-processing") ? "processing" : "done"; + showExpandedOverlay(ui_state, show_details); }); } function removeOverlay() { - let element = document.getElementById("agentDriveAutoOverlay"); + let element = document.getElementById("agente-overlay"); if (element) { element.parentNode.removeChild(element); } } - -function clearOverlayMessages() { +function clearOverlayMessages(keep_default=false) { try { - let chatBox = document.getElementById('chat-box'); + let chatBox = document.getElementById('agente-chat-box'); if (!chatBox) { return; } - console.log("Clearing chat box"); while (chatBox.firstChild) { chatBox.removeChild(chatBox.firstChild); } @@ -339,78 +425,235 @@ function clearOverlayMessages() { } } -function createIcon(className) { - let icon = document.createElement("div"); - icon.className = `icon ${className}`; - return icon; +function updateOverlayState(processing_state, is_collapsed) +{ + if (is_collapsed) { + let borderdiv = document.getElementById("AgentEOverlayBorder"); + if (processing_state === "init"){ + borderdiv.classList.add("agente-init"); + borderdiv.classList.remove("agente-processing"); + borderdiv.classList.remove("agente-done"); + } + else if (processing_state === "processing"){ + borderdiv.classList.remove("agente-init"); + borderdiv.classList.add("agente-processing"); + borderdiv.classList.remove("agente-done"); + } + else if (processing_state === "done"){ + borderdiv.classList.remove("agente-init"); + borderdiv.classList.remove("agente-processing"); + borderdiv.classList.add("agente-done"); + } + } else { + let animation = document.getElementById("AgentEExpandedAnimation"); + if (processing_state === "init"){ + animation.classList.remove("agente-processingLine"); + animation.classList.add("agente-initStateLine"); + animation.classList.remove("agente-doneStateLine"); + enableOverlay(); + } + else if (processing_state === "processing"){ + animation.classList.add("agente-processingLine"); + animation.classList.remove("agente-initStateLine"); + animation.classList.remove("agente-doneStateLine"); + disableOverlay(); + } + else if (processing_state === "done"){ + animation.classList.remove("agente-processingLine"); + animation.classList.remove("agente-initStateLine"); + animation.classList.add("agente-doneStateLine"); + enableOverlay(); + } + } } -function showExpandedOverlay() { +function showExpandedOverlay(processing_state = "init", show_steps=true) { + ui_state = processing_state; + show_details = show_steps; + let agente_logo = `<svg width="85" height="12" viewBox="0 0 85 12" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M0 11.8027L3.43562 0.213699H8.35069L11.8027 11.8027H9.3863L8.23562 7.85753H3.53425L2.38356 11.8027H0ZM4.10959 5.86849H7.66027L6.18082 0.80548H5.58904L4.10959 5.86849Z" fill="#6B6673"/><path d="M19.0946 12C15.6096 12 13.7028 9.56712 13.7028 6.09863C13.7028 2.4 15.9055 0 19.4562 0C22.4151 0 24.5685 1.70959 24.9631 4.35616H22.6124C22.3822 2.87671 21.2151 1.9726 19.5713 1.9726C17.3192 1.9726 16.0535 3.58356 16.0535 6.09863C16.0535 8.35068 17.0726 10.011 19.637 10.011C21.7576 10.011 22.974 8.94247 22.974 7.15068H19.374V5.40822H23.9768C24.8151 5.40822 25.2918 5.85205 25.2918 6.69041V11.8027H23.0069V10.7671L23.4672 8.92603H22.8589C22.8754 9.6 22.4973 12 19.0946 12Z" fill="#6B6673"/><path d="M28.7192 11.8027V0.213699H37.3987V2.20274H31.0206V5.04658H36.5768V6.95342H31.0206V9.8137H37.3987V11.8027H28.7192Z" fill="#6B6673"/><path d="M40.533 11.8027V0.213699H45.0536L49.1631 11.211H49.7385L49.3275 9.76438V0.213699H51.6125V11.8027H47.0919L42.9823 0.80548H42.3905L42.8179 2.25205V11.8027H40.533Z" fill="#6B6673"/><path d="M54.4378 0.213699H64.4159V2.20274H60.5693V11.8027H58.2844V2.20274H54.4378V0.213699Z" fill="#6B6673"/><path d="M63.9401 6.6411H72.4551V8.30137H63.9401V6.6411Z" fill="#6B6673"/><path d="M75.3643 11.8027V0.213699H84.0438V2.20274H77.6657V5.04658H83.2219V6.95342H77.6657V9.8137H84.0438V11.8027H75.3643Z" fill="#6B6673"/></svg>`; + let close_icon = `<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M5 10L10 10L10 5" stroke="#827C8C"/><path d="M19 14L14 14L14 19" stroke="#827C8C"/><path d="M14 5L14 10L19 10" stroke="#827C8C"/><path d="M10 19L10 14L5 14" stroke="#827C8C"/><path d="M6.35355 5.64645C6.15829 5.45118 5.84171 5.45118 5.64645 5.64645C5.45118 5.84171 5.45118 6.15829 5.64645 6.35355L6.35355 5.64645ZM10.3536 9.64645L6.35355 5.64645L5.64645 6.35355L9.64645 10.3536L10.3536 9.64645Z" fill="#827C8C"/><path d="M17.6464 18.3536C17.8417 18.5488 18.1583 18.5488 18.3536 18.3536C18.5488 18.1583 18.5488 17.8417 18.3536 17.6464L17.6464 18.3536ZM13.6464 14.3536L17.6464 18.3536L18.3536 17.6464L14.3536 13.6464L13.6464 14.3536Z" fill="#827C8C"/><path d="M18.3536 6.35355C18.5488 6.15829 18.5488 5.84171 18.3536 5.64645C18.1583 5.45119 17.8417 5.45119 17.6464 5.64645L18.3536 6.35355ZM14.3536 10.3536L18.3536 6.35355L17.6464 5.64645L13.6464 9.64645L14.3536 10.3536Z" fill="#827C8C"/><path d="M5.64645 17.6464C5.45118 17.8417 5.45118 18.1583 5.64645 18.3536C5.84171 18.5488 6.15829 18.5488 6.35355 18.3536L5.64645 17.6464ZM9.64645 13.6464L5.64645 17.6464L6.35355 18.3536L10.3536 14.3536L9.64645 13.6464Z" fill="#827C8C"/></svg>`; + let icon = `<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><rect x="6.5" y="7.5" width="11" height="11" rx="0.5" stroke="#827C8C"/><rect x="-0.5" y="0.5" width="3" height="5" rx="0.5" transform="matrix(-1 0 0 1 6 10)" stroke="#827C8C"/><rect x="-0.5" y="0.5" width="3" height="5" rx="0.5" transform="matrix(-1 0 0 1 20 10)" stroke="#827C8C"/><path d="M12 4V7.5" stroke="#827C8C" stroke-linecap="round"/><rect x="8.5" y="11.5" width="7" height="3" rx="1.5" stroke="#827C8C"/></svg>`; removeOverlay(); window.overlay_state_changed(false); - console.log("showing expanded overlay"); let newDiv = document.createElement("div"); - newDiv.id = "agentDriveAutoOverlay"; - newDiv.classList.add("highlight_overlay"); - newDiv.classList.add("agentDriveAutoOverlay"); + newDiv.id = "agente-overlay"; + newDiv.classList.add("agente-highlight_overlay"); newDiv.setAttribute("aria-hidden", "true"); + newDiv.setAttribute("tabindex", "0"); + + let header = document.createElement("div"); + header.style.display = "flex"; + header.style.flexDirection = "row"; + header.style.margin = "4%"; + + let logoIcon= document.createElement("div"); + logoIcon.style.width = "25px"; + logoIcon.style.height = "25px"; + logoIcon.style.backgroundImage = `url('data:image/svg+xml;utf8,${encodeURIComponent(icon)}')`; + logoIcon.style.backgroundRepeat = "no-repeat"; + logoIcon.style.backgroundSize = "contain"; + logoIcon.style.backgroundPosition = "bottom"; + logoIcon.style.order = 1; + logoIcon.style.alignSelf = "flex-end"; + logoIcon.style.marginRight = "1%"; + + let logoDiv = document.createElement("div"); + logoDiv.style.width = "100px"; + logoDiv.style.height = "25px"; + logoDiv.style.backgroundImage = `url('data:image/svg+xml;utf8,${encodeURIComponent(agente_logo)}')`; + logoDiv.style.backgroundRepeat = "no-repeat"; + logoDiv.style.backgroundSize = "contain"; + logoDiv.style.backgroundPosition = "bottom"; + // Style the logoDiv and button + logoDiv.style.order = 1; + let closeButton = document.createElement("button"); - closeButton.id = "closebutton"; - closeButton.textContent = "X"; + closeButton.id = "agente-closebutton"; + closeButton.style.backgroundImage = `url('data:image/svg+xml;utf8,${encodeURIComponent(close_icon)}')`; + closeButton.style.backgroundRepeat = "no-repeat"; + closeButton.style.backgroundSize = "contain"; + closeButton.style.backgroundPosition = "bottom"; closeButton.onclick = function () { - showCollapsedOverlay(); + let ui_state = document.getElementById("AgentEExpandedAnimation").classList.contains("agente-initStateLine") ? "init" : document.getElementById("AgentEExpandedAnimation").classList.contains("agente-processingLine") ? "processing" : "done"; + showCollapsedOverlay(ui_state, show_details); }; - + closeButton.style.order = 3; + header.appendChild(logoIcon); + header.appendChild(logoDiv); + let animation = document.createElement("div"); + animation.id = "AgentEExpandedAnimation"; + animation.style.height = "2px"; + animation.style.width = "100%"; + + header.appendChild(closeButton); // Append the close button to the newDiv - newDiv.appendChild(closeButton); + newDiv.appendChild(header); + + newDiv.appendChild(animation); let chatContainer = document.createElement("div"); - chatContainer.className = "chat-container"; + chatContainer.className = "agente-chat-container"; let chatBox = document.createElement("div"); - chatBox.id = "chat-box"; + chatBox.id = "agente-chat-box"; let chatInput = document.createElement("div"); - chatInput.className = "chat-input"; - - let iconAgent1 = createIcon("agent1"); - + chatInput.className = "agente-chat-input"; chatBox.appendChild(chatInput); let inputContainer = document.createElement("div"); - inputContainer.className = "input-container"; - + inputContainer.className = "agente-input-container"; + inputContainer.id = "agente-input-container"; let userInput = document.createElement("textarea"); - userInput.id = "user-input"; - userInput.placeholder = "Type the task for the agent..."; + userInput.id = "agente-user-input"; + userInput.placeholder = "What can I help you solve today?"; + userInput.addEventListener('input', function(event) { + let text = event.target.value; + if (text.trim() == "") { + let button_disabled_svg =`<svg width="40" height="40" viewBox="0 0 40 40" fill="none" xmlns="http://www.w3.org/2000/svg"><rect width="40" height="40" rx="4" fill="#EEEEEF"/><path d="M15 20H25" stroke="#AEA9B4" stroke-width="1.5"/><path d="M20 15L25 20L20 25" stroke="#AEA9B4" stroke-width="1.5"/></svg>`; + let sendBtn = document.getElementById('agente-send-btn'); + sendBtn.style.backgroundImage = `url('data:image/svg+xml;utf8,${encodeURIComponent(button_disabled_svg)}')`; + } + else{ + let button_enabled_svg= `<svg width="40" height="40" viewBox="0 0 40 40" fill="none" xmlns="http://www.w3.org/2000/svg"><rect width="40" height="40" rx="4" fill="#252539"/><path d="M15 20H25" stroke="white" stroke-width="1.5"/><path d="M20 15L25 20L20 25" stroke="white" stroke-width="1.5"/></svg>`; + let sendBtn = document.getElementById('agente-send-btn'); + sendBtn.style.backgroundImage = `url('data:image/svg+xml;utf8,${encodeURIComponent(button_enabled_svg)}')`; + } + }); + let userinput_footer = document.createElement("div"); + userinput_footer.style.display = "flex"; + userinput_footer.style.flexDirection = "row"; + userinput_footer.style.justifyContent = "space-between"; + userinput_footer.style.alignItems = "center"; + userinput_footer.style.height = "40%"; + userinput_footer.style.margin = "2% 1%"; + userinput_footer.id="userinput_section" + + let toggleLabel = document.createElement("label"); // Create a new label element + toggleLabel.textContent = "Show Details"; // Set the text content of the label + toggleLabel.style.color = "#6B6673"; // Set the color of the label + toggleLabel.style.fontFamily = "Noto Sans SC"; // Set the font of the label + toggleLabel.style.fontSize = "14px"; // Set the font size of the label + toggleLabel.style.fontWeight = "400"; // Set the font weight of the label + toggleLabel.style.margin = "0px"; // Add some margin to the right of the label + toggleLabel.style.marginRight = "10px"; // Add some margin to the right of the label + + let toggleSwitch = document.createElement("input"); + + toggleSwitch.type = "checkbox"; + toggleSwitch.className = "agente-toggle"; + toggleSwitch.style.width = "44px"; + toggleSwitch.style.height = "24px"; + toggleSwitch.style.margin = "0px"; + + if (show_details){ + toggleSwitch.checked = true; + } + else{ + toggleSwitch.checked = false; + } + + toggleSwitch.addEventListener('change', function() { + if(this.checked) { + show_details = true; + window.show_steps_state_changed(true) + } else { + show_details = false; + window.show_steps_state_changed(false) + } +}); - let sendBtn = document.createElement("button"); - sendBtn.id = "send-btn"; - sendBtn.textContent = "Send"; + let sendicon =`<svg width="40" height="40" viewBox="0 0 40 40" fill="none" xmlns="http://www.w3.org/2000/svg"><rect width="40" height="40" rx="4" fill="#EEEEEF"/><path d="M15 20H25" stroke="#AEA9B4" stroke-width="1.5"/><path d="M20 15L25 20L20 25" stroke="#AEA9B4" stroke-width="1.5"/></svg>`; + let sendBtn = document.createElement("div"); + sendBtn.id = "agente-send-btn"; + sendBtn.style.backgroundImage = `url('data:image/svg+xml;utf8,${encodeURIComponent(sendicon)}')`; + sendBtn.style.backgroundRepeat = "no-repeat"; + sendBtn.style.backgroundSize = "contain"; + sendBtn.style.backgroundPosition = "right"; + sendBtn.style.width = "8%"; + sendBtn.style.height = "100%"; + sendBtn.style.marginLeft = "auto"; + + userinput_footer.appendChild(toggleLabel); // Add the label to the div + userinput_footer.appendChild(toggleSwitch); + userinput_footer.appendChild(sendBtn); inputContainer.appendChild(userInput); - inputContainer.appendChild(sendBtn); + inputContainer.appendChild(userinput_footer); chatContainer.appendChild(chatBox); chatContainer.appendChild(inputContainer); newDiv.appendChild(chatContainer); - document.body.appendChild(newDiv); + let disclaimer = document.createElement("p"); + disclaimer.style.fontFamily = "Noto Sans SC"; + disclaimer.style.fontSize = "12px"; + disclaimer.style.color = "#6B6673"; + disclaimer.style.alignSelf = "center"; + disclaimer.style.position = "absolute"; + disclaimer.style.bottom = "0%"; + disclaimer.style.margin = "0% 0% 1% 0%"; + disclaimer.textContent = "Agent-E may make mistakes. Verify key info."; + + newDiv.appendChild(disclaimer); - document.getElementById('send-btn').addEventListener('click', function () { - let task = document.getElementById('user-input').value - if (task && !isDisabled()) { + document.body.appendChild(newDiv); + updateOverlayState(processing_state, false); + document.getElementById('agente-send-btn').addEventListener('click', function () { + let task = document.getElementById('agente-user-input').value + let task_trimmed = task.trim(); + if (task_trimmed && !isDisabled() && task_trimmed.length > 0) { if (awaitingUserResponse) { addUserMessage(task); - document.getElementById('user-input').value = ""; + document.getElementById('agente-user-input').value = ""; } else { - console.log(`Sending ${task} to server`); - + clearOverlayMessages(); addUserMessage(task); + disableOverlay(); window.process_task(task) - document.getElementById('user-input').value = ""; + document.getElementById('agente-user-input').value = ""; } } else { @@ -421,9 +664,8 @@ function showExpandedOverlay() { userInput.addEventListener('focus', function() { if (window.getSelection().rangeCount > 0) { let selectedText = window.getSelection().toString(); - console.log(selectedText); if (selectedText) { - document.getElementById('user-input').value = selectedText + '\n'; + document.getElementById('agente-user-input').value = selectedText + '\n'; setTimeout(function() { userInput.selectionStart = userInput.selectionEnd = userInput.value.length; userInput.scrollTop = userInput.scrollHeight; @@ -441,12 +683,12 @@ userInput.addEventListener('blur', function() { } }); - document.getElementById('user-input').addEventListener('keydown', function (event) { + document.getElementById('agente-user-input').addEventListener('keydown', function (event) { // Check if the pressed key is the Enter key if (event.key === "Enter") { event.preventDefault(); - let targetElement = document.getElementById('send-btn'); + let targetElement = document.getElementById('agente-send-btn'); // Create a new click event let clickEvent = new MouseEvent('click', { @@ -463,46 +705,48 @@ userInput.addEventListener('blur', function() { function focusOnOverlayInput() { - document.getElementById('user-input').focus(); + document.getElementById('agente-user-input').focus(); } -function addMessage(message, sender) { - //console.log(`Adding ${sender} message: ${message}`); +function addMessage(message, sender, message_type = "plan") { let newDiv = document.createElement("div"); - newDiv.classList.add("chat-input"); - - let iconDiv1 = document.createElement("div"); - iconDiv1.classList.add("icon"); - + newDiv.classList.add("agente-chat-input"); let chatDiv = document.createElement("div"); - chatDiv.classList.add("chat"); - - let iconDiv2 = document.createElement("div"); - iconDiv2.classList.add("icon"); + chatDiv.classList.add("agente-chat"); - newDiv.appendChild(iconDiv1); - newDiv.appendChild(chatDiv); - newDiv.appendChild(iconDiv2); let parsedMessage = message; try { parsedMessage = JSON.parse(message); } catch (e) { - //console.log("Message is not in JSON format, using original message."); + console.log("Message is not in JSON format, using original message."); } // Customize based on the sender if (sender === "system") { - iconDiv1.classList.add("agent1"); - chatDiv.classList.add("agent1text", "pre-line"); - chatDiv.innerText = parsedMessage; + newDiv.classList.add("agente-agent"); + chatDiv.classList.add("agente-systemMessage", "agente-pre-line"); + if (message_type === "step") { + chatDiv.classList.add("agente-agentstep"); + } + else if (message_type === "plan" || message_type === "question") { + chatDiv.classList.add("agente-agentplan"); + } + + else if (message_type === "answer") { + chatDiv.classList.add("agente-agentanswer"); + } + if ((message_type === "info" && message.includes("Task Completed")) || message_type==="question") { + enableOverlay(); + } + chatDiv.textContent = parsedMessage; } else if (sender === "user") { - iconDiv2.classList.add("user"); - chatDiv.classList.add("usertext", "pre-line"); - chatDiv.innerText = parsedMessage; + newDiv.classList.add("agente-user") + chatDiv.classList.add("agente-usertext", "agente-pre-line"); + chatDiv.textContent = parsedMessage; } - - let chatBox = document.getElementById('chat-box'); + newDiv.appendChild(chatDiv); + let chatBox = document.getElementById('agente-chat-box'); chatBox.appendChild(newDiv); chatBox.scrollTop = chatBox.scrollHeight; newDiv.scrollIntoView({ behavior: 'instant' }); @@ -512,35 +756,42 @@ function addMessage(message, sender) { // Notify the server that the user has responded to the agent's prompt window.user_response(message); } -} +} -function addSystemMessage(message, is_awaiting_user_response = false) { - awaitingUserResponse = is_awaiting_user_response; - addMessage(message, "system"); +function addSystemMessage(message, is_awaiting_user_response = false, message_type = "plan") { + // Function to actually add the message + function executeAddMessage() { + awaitingUserResponse = is_awaiting_user_response; + addMessage(message, "system", message_type); + } + requestAnimationFrame(executeAddMessage); } function addUserMessage(message) { addMessage(message, "user"); } - function disableOverlay() { - let element = document.getElementById("agentDriveAutoOverlay"); - element.classList.remove("enabled"); - element.classList.add("disabled"); + let input_field= document.getElementById("agente-user-input"); + if(input_field){ + input_field.placeholder = "Processing..."; + } } function isDisabled() { - let element = document.getElementById("agentDriveAutoOverlay"); - return element.classList.contains("disabled"); + let input_field= document.getElementById("agente-user-input"); + if(input_field){ + return input_field.placeholder === "Processing..."; + } } + function enableOverlay() { - let element = document.getElementById("agentDriveAutoOverlay"); - element.classList.add("enabled"); - element.classList.remove("disabled"); - document.getElementById('user-input').focus(); + let input_field= document.getElementById("agente-user-input"); + if(input_field){ + input_field.placeholder = "What can I help you solve today?"; + } } function commandExecutionCompleted() { diff --git a/ae/user_preferences/user_preferences.txt b/ae/user_preferences/user_preferences.txt index 3065eee..0bac996 100644 --- a/ae/user_preferences/user_preferences.txt +++ b/ae/user_preferences/user_preferences.txt @@ -8,4 +8,5 @@ Email: myemail@gmail.com Phone Number: 123-456-7890 Here are some of my preferences: Shopping Preferences: www.amazon.com -Favorite news source: www.bbc.com \ No newline at end of file +Favorite news source: www.bbc.com +Favorite flight booking site to use with every flight related query: https://www.google.com/travel/flights \ No newline at end of file diff --git a/ae/utils/anthropic_llm_helper.py b/ae/utils/anthropic_llm_helper.py index a3480fb..6fbc870 100644 --- a/ae/utils/anthropic_llm_helper.py +++ b/ae/utils/anthropic_llm_helper.py @@ -1,9 +1,9 @@ -import asyncio -from anthropic import AsyncAnthropic +import os + import anthropic +from anthropic import AsyncAnthropic from dotenv import load_dotenv -import os -from ae.core.prompts import LLM_PROMPTS + class AnthropicLLMHelper: def __init__(self): @@ -14,7 +14,7 @@ async def get_chat_completion_response_async(self, system_msg:str, user_msgs:lis formatted_user_msgs: list[dict[str, str]] = [] for user_msg in user_msgs: formatted_user_msgs.append({"type": "text", "text": user_msg}) - + try: message = await self.client.messages.create( model=model_name, @@ -24,8 +24,8 @@ async def get_chat_completion_response_async(self, system_msg:str, user_msgs:lis messages=[ { "role": "user", - "content": formatted_user_msgs - + "content": formatted_user_msgs # type: ignore + } ] ) @@ -34,18 +34,19 @@ async def get_chat_completion_response_async(self, system_msg:str, user_msgs:lis except anthropic.APIConnectionError as e: print("The server could not be reached") print(e.__cause__) # an underlying Exception, likely raised within httpx. - raise Exception(f"Calling {model_name} LLM failed. The server could not be reached. Error: {e}") + raise Exception(f"Calling {model_name} LLM failed. The server could not be reached. Error: {e}") # noqa: B904 except anthropic.RateLimitError as e: print("A 429 status code was received; we should back off a bit.") - raise Exception(f"Calling {model_name} LLM failed. Rate limit error. Error: {e}") + raise Exception(f"Calling {model_name} LLM failed. Rate limit error. Error: {e}") # noqa: B904 except anthropic.APIStatusError as e: print(e.status_code) print(e.response) - raise Exception(f"Calling {model_name} LLM failed. Error: {e}") + raise Exception(f"Calling {model_name} LLM failed. Error: {e}") # noqa: B904 # async def main(): +# from ae.core.prompts import LLM_PROMPTS # helper = AnthropicLLMHelper() # response = await helper.get_chat_completion_response_async(LLM_PROMPTS["SKILLS_HARVESTING_PROMPT"], ["What is the weather like today?"], temperature=0, max_tokens=4000) # print("*******\nResponse: ", response, "\n*******\n") -# asyncio.run(main()) \ No newline at end of file +# asyncio.run(main()) diff --git a/ae/utils/autogen_sequential_function_call.py b/ae/utils/autogen_sequential_function_call.py new file mode 100644 index 0000000..1ef580e --- /dev/null +++ b/ae/utils/autogen_sequential_function_call.py @@ -0,0 +1,84 @@ + +import asyncio +import inspect +from typing import Any + +from autogen import Agent # type: ignore +from autogen import UserProxyAgent # type: ignore + + +class UserProxyAgent_SequentialFunctionExecution(UserProxyAgent): + def __init__(self, *args, **kwargs): # type: ignore + super().__init__(*args, **kwargs) # type: ignore + self.register_reply(Agent, UserProxyAgent_SequentialFunctionExecution.sequential_generate_tool_calls_reply) # type: ignore + + + def sequential_generate_tool_calls_reply( # type: ignore + self, + messages: list[dict] | None = None, # type: ignore + sender: Agent | None = None, + config: Any | None = None, + ) -> tuple[bool, dict[str, Any] | None]: + """Generate a reply using tool call.""" + if config is None: + config = self + if messages is None: + messages = self._oai_messages[sender] # type: ignore + message = messages[-1] # type: ignore + tool_returns = [] + skip_flag:bool = False + for tool_call in message.get("tool_calls", []): # type: ignore + function_call = tool_call.get("function", {}) # type: ignore + func = self._function_map.get(function_call.get("name", None), None) # type: ignore + func_return = None + if inspect.iscoroutinefunction(func): # type: ignore + try: + # get the running loop if it was already created + loop = asyncio.get_running_loop() + close_loop = False + except RuntimeError: + # create a loop if there is no running loop + loop = asyncio.new_event_loop() + close_loop = True + if (not skip_flag): + _, func_return = loop.run_until_complete(self.a_execute_function(function_call)) # type: ignore + if close_loop: + loop.close() + else: + if (not skip_flag): + _, func_return = self.execute_function(function_call) # type: ignore + if func_return is None: # type: ignore + if skip_flag: + content = "VERY IMPORTANT: This function could not be executed since previous function resulted in a Webpage change. You must get all_fields DOM and repeat the function if needed." + else: + content = "" + else: + content = func_return.get("content", "") # type: ignore + + if content is None: + content = "" + + if ("as a consequence of this action" in content.lower()): # type: ignore + skip_flag = True + + tool_call_id = tool_call.get("id", None) # type: ignore + if tool_call_id is not None: + tool_call_response = { # type: ignore + "tool_call_id": tool_call_id, + "role": "tool", + "content": content, + } + else: + tool_call_response = { # type: ignore + "role": "tool", + "content": content, + } + tool_returns.append(tool_call_response) # type: ignore + + if tool_returns: + return True, { + "role": "tool", + "tool_responses": tool_returns, + "content": "\n\n".join([self._str_for_tool_response(tool_return) for tool_return in tool_returns]), # type: ignore + } + return False, None diff --git a/ae/utils/dom_helper.py b/ae/utils/dom_helper.py index d7d09a1..11ab38b 100644 --- a/ae/utils/dom_helper.py +++ b/ae/utils/dom_helper.py @@ -1,6 +1,7 @@ import asyncio -from playwright.async_api import ElementHandle, Page +from playwright.async_api import ElementHandle +from playwright.async_api import Page from ae.utils.logger import logger @@ -31,7 +32,7 @@ async def get_element_outer_html(element: ElementHandle, page: Page, element_tag """ tag_name: str = element_tag_name if element_tag_name else await page.evaluate("element => element.tagName.toLowerCase()", element) - attributes_of_interest: list[str] = ['id', 'name', 'aria-label', 'placeholder', 'href', 'src', 'aria-autocomplete', 'role', 'type', + attributes_of_interest: list[str] = ['id', 'name', 'aria-label', 'placeholder', 'href', 'src', 'aria-autocomplete', 'role', 'type', 'data-testid', 'value', 'selected', 'aria-labelledby', 'aria-describedby', 'aria-haspopup'] opening_tag: str = f'<{tag_name}' diff --git a/ae/utils/dom_mutation_observer.py b/ae/utils/dom_mutation_observer.py index 0748887..95a6f5e 100644 --- a/ae/utils/dom_mutation_observer.py +++ b/ae/utils/dom_mutation_observer.py @@ -1,57 +1,68 @@ +import asyncio import json +from typing import Callable # noqa: UP035 + from playwright.async_api import Page -from typing import List, Callable -from playwright.async_api import Page -import asyncio # Create an event loop loop = asyncio.get_event_loop() -DOM_change_callback: List[Callable[[str], None]] = [] +DOM_change_callback: list[Callable[[str], None]] = [] def subscribe(callback: Callable[[str], None]) -> None: - DOM_change_callback.append(callback) + DOM_change_callback.append(callback) def unsubscribe(callback: Callable[[str], None]) -> None: DOM_change_callback.remove(callback) async def add_mutation_observer(page:Page): - """ - Adds a mutation observer to the page to detect changes in the DOM. + """ + Adds a mutation observer to the page to detect changes in the DOM. When changes are detected, the observer calls the dom_mutation_change_detected function in the browser context. This changes can be detected by subscribing to the dom_mutation_change_detected function by individual skills. - Current implementation only detects when a new node is added to the DOM. + Current implementation only detects when a new node is added to the DOM. However, in many cases, the change could be a change in the style or class of an existing node (e.g. toggle visibility of a hidden node). """ - await page.evaluate(""" - console.log('Adding a mutation observer for DOM changes'); - new MutationObserver((mutationsList, observer) => { - let changes_detected = []; - for(let mutation of mutationsList) { - if (mutation.type === 'childList') { - let allAddedNodes=mutation.addedNodes; - for(let node of allAddedNodes) { - if(node.tagName && !['SCRIPT', 'NOSCRIPT', 'STYLE'].includes(node.tagName) && !node.closest('#agentDriveAutoOverlay')) { - let visibility=node.offsetWidth > 0 && node.offsetHeight > 0; - let content = node.innerText.trim(); - if(visibility && node.innerText.trim() && window.getComputedStyle(node).display !== 'none'){ + await page.evaluate(""" + console.log('Adding a mutation observer for DOM changes'); + new MutationObserver((mutationsList, observer) => { + let changes_detected = []; + for(let mutation of mutationsList) { + if (mutation.type === 'childList') { + let allAddedNodes=mutation.addedNodes; + for(let node of allAddedNodes) { + if(node.tagName && !['SCRIPT', 'NOSCRIPT', 'STYLE'].includes(node.tagName) && !node.closest('#agentDriveAutoOverlay')) { + let visibility=true; + let content = node.innerText.trim(); + if(visibility && node.innerText.trim()){ + if(content) { + changes_detected.push({tag: node.tagName, content: content}); + } + } + } + } + } else if (mutation.type === 'characterData') { + let node = mutation.target; + if(node.parentNode && !['SCRIPT', 'NOSCRIPT', 'STYLE'].includes(node.parentNode.tagName) && !node.parentNode.closest('#agentDriveAutoOverlay')) { + let visibility=true; + let content = node.data.trim(); + if(visibility && content && window.getComputedStyle(node.parentNode).display !== 'none'){ if(content && !changes_detected.some(change => change.content.includes(content))) { - changes_detected.push({tag: node.tagName, content: content}); + changes_detected.push({tag: node.parentNode.tagName, content: content}); } - } + } } } } - } - if(changes_detected.length > 0) { - window.dom_mutation_change_detected(JSON.stringify(changes_detected)); - } - }).observe(document, {subtree: true, childList: true}); - """) + if(changes_detected.length > 0) { + window.dom_mutation_change_detected(JSON.stringify(changes_detected)); + } + }).observe(document, {subtree: true, childList: true, characterData: true}); + """) async def handle_navigation_for_mutation_observer(page:Page): @@ -61,7 +72,7 @@ async def dom_mutation_change_detected(changes_detected: str): """ Detects changes in the DOM (new nodes added) and emits the event to all subscribed callbacks. The changes_detected is a string in JSON formatt containing the tag and content of the new nodes added to the DOM. - + e.g. The following will be detected when autocomplete recommendations show up when one types Nelson Mandela on google search [{'tag': 'SPAN', 'content': 'nelson mandela wikipedia'}, {'tag': 'SPAN', 'content': 'nelson mandela movies'}] """ @@ -74,4 +85,4 @@ async def dom_mutation_change_detected(changes_detected: str): await callback(changes_detected) # If the callback is a regular function else: - callback(changes_detected) \ No newline at end of file + callback(changes_detected) diff --git a/ae/utils/gemini_llm_helper.py b/ae/utils/gemini_llm_helper.py index 0cc2dec..d6d4518 100644 --- a/ae/utils/gemini_llm_helper.py +++ b/ae/utils/gemini_llm_helper.py @@ -1,13 +1,11 @@ -import asyncio -from typing import Any -import google.generativeai as genai # type: ignore -from dotenv import load_dotenv import os import re -import json -from ae.utils.logger import logger -from ae.core.prompts import LLM_PROMPTS +from typing import Any +import google.generativeai as genai # type: ignore +from dotenv import load_dotenv + +from ae.utils.logger import logger GCP_BLOCK_NONE_SAFETY_SETTINGS: list[dict[str, str]] = [ { @@ -35,8 +33,7 @@ class GeminiLLMHelper: def __init__(self): load_dotenv() - genai.configure(api_key=os.environ.get("GEMINI_API_KEY")) - + genai.configure(api_key=os.environ.get("GEMINI_API_KEY")) # type: ignore def process_llm_response(self, response: str): if response: @@ -44,16 +41,14 @@ def process_llm_response(self, response: str): response = llm_json_or_python_begin_response_pattern.sub("", response) response = llm_end_response_pattern.sub("", response) return response - - async def get_chat_completion_response_async(self, system_msg:str, user_msgs:list[str], model_name:str="gemini-1.5-pro-latest", temperature:float=0.1, + async def get_chat_completion_response_async(self, system_msg:str, user_msgs:list[str], model_name:str="gemini-1.5-pro-latest", temperature:float=0.1, max_tokens:int=256, top_p:int=1, top_k: int=1, safety_settings:list[dict[str, str]]=GCP_BLOCK_NONE_SAFETY_SETTINGS) -> str|None: formatted_msgs: list[dict[str, Any]] = [{"role": "user", "parts": [system_msg]}] user_msgs_parts: list[str] = [] for user_msg in user_msgs: user_msgs_parts.append(user_msg) - - + formatted_msgs.append({"role": "user", "parts": user_msgs_parts}) response = None try: @@ -74,6 +69,7 @@ async def get_chat_completion_response_async(self, system_msg:str, user_msgs:lis return None # async def main(): +# from ae.core.prompts import LLM_PROMPTS # helper = GeminiLLMHelper() # response = await helper.get_chat_completion_response_async(LLM_PROMPTS["SKILLS_HARVESTING_PROMPT"], ["What is the weather like today?", "And How are you?"], temperature=0, max_tokens=4000) # print("*******\nResponse: ", response, "\n*******\n") diff --git a/ae/utils/get_detailed_accessibility_tree.py b/ae/utils/get_detailed_accessibility_tree.py index b218d31..f40e6f3 100644 --- a/ae/utils/get_detailed_accessibility_tree.py +++ b/ae/utils/get_detailed_accessibility_tree.py @@ -99,6 +99,9 @@ async def process_node(node: dict[str, Any]): if node['role'] == 'menuitem': return node.get('name') + if node.get('role') == 'dialog' and node.get('modal') == True: # noqa: E712 + node["important information"] = "This is a modal dialog. Please interact with this dialog and close it to be able to interact with the full page (e.g. by pressing the close button or selecting an option)." + if mmid: # Determine if we need to fetch 'innerText' based on the absence of 'children' in the accessibility node should_fetch_inner_text = 'children' not in node @@ -122,7 +125,6 @@ async def process_node(node: dict[str, Any]): console.log(`Ignoring element with id: ${element.id}`, element); return null; } - //Ignore "option" because it would have been processed with the select element if (tags_to_ignore.includes(element.tagName.toLowerCase()) || element.tagName.toLowerCase() === "option") return null; @@ -133,7 +135,7 @@ async def process_node(node: dict[str, Any]): // If the element is an input, include its type as well if (element.tagName.toLowerCase() === 'input') { attributes_to_values['tag_type'] = element.type; // This will capture 'checkbox', 'radio', etc. - } + } else if (element.tagName.toLowerCase() === 'select') { attributes_to_values["mmid"] = element.getAttribute('mmid'); attributes_to_values["role"] = "combobox"; @@ -150,7 +152,6 @@ async def process_node(node: dict[str, Any]): } return attributes_to_values; } - for (const attribute of attributes) { let value = element.getAttribute(attribute); @@ -169,6 +170,26 @@ async def process_node(node: dict[str, Any]): attributes_to_values['description'] = element.innerText; } + let role = element.getAttribute('role'); + if(role==='listbox' || element.tagName.toLowerCase()=== 'ul'){ + let children=element.children; + let filtered_children = Array.from(children).filter(child => child.getAttribute('role') === 'option'); + console.log("Listbox or ul found: ", filtered_children); + let attributes_to_include = ['mmid', 'role', 'aria-label','value']; + attributes_to_values["additional_info"]=[] + for (const child of children) { + let children_attributes_to_values = {}; + + for (let attr of child.attributes) { + // If the attribute is not in the predefined list, add it to children_attributes_to_values + if (attributes_to_include.includes(attr.name)) { + children_attributes_to_values[attr.name] = attr.value; + } + } + + attributes_to_values["additional_info"].push(children_attributes_to_values); + } + } // Check if attributes_to_values contains more than just 'name', 'role', and 'mmid' const keys = Object.keys(attributes_to_values); const minimalKeys = ['tag', 'mmid']; @@ -194,10 +215,10 @@ async def process_node(node: dict[str, Any]): // Check if the button has no text and no attributes if (element.innerText.trim() === '') { - + for (const child of children) { let children_attributes_to_values = {}; - + for (let attr of child.attributes) { // If the attribute is not in the predefined list, add it to children_attributes_to_values if (!attributes_to_exclude.includes(attr.name)) { @@ -228,7 +249,7 @@ async def process_node(node: dict[str, Any]): if 'keyshortcuts' in node: del node['keyshortcuts'] #remove keyshortcuts since it is not needed - + node["mmid"]=mmid # Update the node with fetched information @@ -241,7 +262,7 @@ async def process_node(node: dict[str, Any]): if 'name' in node and 'description' in node and (node['name'] == node['description'] or node['name'] == node['description'].replace('\n', ' ') or node['description'].replace('\n', '') in node['name']): del node['description'] #if the name is same as description, then remove the description to avoid duplication - + if 'name' in node and 'aria-label' in node and node['aria-label'] in node['name']: del node['aria-label'] #if the name is same as the aria-label, then remove the aria-label to avoid duplication @@ -252,7 +273,7 @@ async def process_node(node: dict[str, Any]): node.pop("children", None) node.pop("role", None) node.pop("description", None) - + #role and tag can have the same info. Get rid of role if it is the same as tag if node.get('role') == node.get('tag'): del node['role'] @@ -289,7 +310,7 @@ async def process_node(node: dict[str, Any]): } """ #textbox just means a text input and that is expressed well enough with the rest of the attributes returned - del node['role'] + #del node['role'] #remove attributes that are not needed once processing of a node is complete for attribute_to_delete in attributes_to_delete: @@ -411,11 +432,21 @@ def __should_prune_node(node: dict[str, Any], only_input_fields: bool): if node.get('role') == 'generic' and 'children' not in node and not ('name' in node and node.get('name')): # The presence of 'children' is checked after potentially deleting it above return True - + if node.get('role') in ['separator', 'LineBreak']: return True + processed_name = "" + if 'name' in node: + processed_name:str =node.get('name') # type: ignore + processed_name = processed_name.replace(',', '') + processed_name = processed_name.replace(':', '') + processed_name = processed_name.replace('\n', '') + processed_name = processed_name.strip() + if len(processed_name) <3: + processed_name = "" + #check if the node only have name and role, then delete that node - if len(node) == 2 and 'name' in node and 'role' in node: + if len(node) == 2 and 'name' in node and 'role' in node and not (node.get('role') == "text" and processed_name != ""): return True return False diff --git a/ae/utils/http_helper.py b/ae/utils/http_helper.py new file mode 100644 index 0000000..3520b68 --- /dev/null +++ b/ae/utils/http_helper.py @@ -0,0 +1,43 @@ +from typing import Any + +import requests + + +def make_post_request(url: str, data: dict[str, Any], api_key: str, api_key_header_name: str = "apikey") -> dict[str, Any]|None: + """ + Makes a POST request to the specified URL with a JSON body and an API key header. + + Args: + url (str): The URL to send the POST request to. + data (Dict[str, Any]): The JSON data to include in the POST request body. + api_key (str): The API key to include in the request headers. + api_key_header_name (str): The name of the header to include the API key in. Defaults to "apikey". + + Returns: + Optional[Dict[str, Any]]: The JSON response from the server if the request was successful and the response is in JSON format. + None: If the request failed or the response is not in JSON format. + + Raises: + requests.exceptions.RequestException: If an error occurs during the HTTP request. + """ + # Define the headers for the request + headers = { + 'Content-Type': 'application/json', + api_key_header_name: api_key + } + + try: + # Make the POST request with the given URL, data, and headers + response = requests.post(url, json=data, headers=headers) + + # Check if the request was successful + response.raise_for_status() + + # Attempt to return the JSON response + return response.json() + except requests.exceptions.RequestException as e: + print(f"Error: {e}") + return None + except ValueError: + print("Error: Response is not in JSON format") + return None diff --git a/ae/utils/js_helper.py b/ae/utils/js_helper.py index c473042..3ddc183 100644 --- a/ae/utils/js_helper.py +++ b/ae/utils/js_helper.py @@ -1,4 +1,7 @@ import json +import re + +from ae.utils.logger import logger def escape_js_message(message: str) -> str: @@ -12,3 +15,20 @@ def escape_js_message(message: str) -> str: str: The escaped message. """ return json.dumps(message) + + +def beautify_plan_message(message:str) -> str: + """ + Add a newline between each numbered step in the plan message if it does not already exist. + + Args: + message (str): The plan message. + + Returns: + str: The plan message with newlines added between each numbered step. + """ + logger.debug(f"beautify_plan_message original:\n{message}") + # Add a newline before each numbered step that is not already preceded by a newline + plan_with_newlines = re.sub(r'(?<!\n)( \d+\.)', r'\n\1', message) + logger.debug(f"beautify_plan_message modified:\n{plan_with_newlines}") + return plan_with_newlines diff --git a/ae/utils/logger.py b/ae/utils/logger.py index 5e4d4e0..4674662 100644 --- a/ae/utils/logger.py +++ b/ae/utils/logger.py @@ -1,10 +1,16 @@ import logging logger = logging.getLogger(__name__) -logging.basicConfig( - level=logging.INFO, # change level here or use set_log_level() to change it - format="[%(asctime)s] %(levelname)s {%(filename)s:%(lineno)d} - %(message)s", -) +'''logging.basicConfig( + level=logging.DEBUG, # change level here or use set_log_level() to change it + format="[%(asctime)s] %(levelname)s {%(filename)s:%(lineno)d} - %(message)s", filename='app.log', filemode='a' +)''' +logging.basicConfig(level=logging.INFO) +logging.getLogger("httpcore").setLevel(logging.WARNING) +logging.getLogger("httpx").setLevel(logging.WARNING) +logging.getLogger("matplotlib.pyplot").setLevel(logging.WARNING) +logging.getLogger("PIL.PngImagePlugin").setLevel(logging.WARNING) +logging.getLogger("PIL.Image").setLevel(logging.WARNING) def set_log_level(level: str | int) -> None: """ diff --git a/ae/utils/response_parser.py b/ae/utils/response_parser.py new file mode 100644 index 0000000..bae1e70 --- /dev/null +++ b/ae/utils/response_parser.py @@ -0,0 +1,60 @@ +import json +from typing import Any + +from ae.utils.logger import logger + + +def parse_response(message: str) -> dict[str, Any]: + """ + Parse the response from the browser agent and return the response as a dictionary. + """ + # Parse the response content + json_response = {} + #if message starts with ``` and ends with ``` then remove them + if message.startswith("```"): + message = message[3:] + if message.endswith("```"): + message = message[:-3] + if message.startswith("json"): + message = message[4:] + + message = message.strip() + try: + json_response: dict[str, Any] = json.loads(message) + except Exception as e: + # If the response is not a valid JSON, try pass it using string matching. + #This should seldom be triggered + logger.warn(f"LLM response was not properly formed JSON. Will try to use it as is. LLM response: \"{message}\". Error: {e}") + message = message.replace("\\n", "\n") + message = message.replace("\n", "") # type: ignore + if ("plan" in message and "next_step" in message): + start = message.index("plan") + len("plan") + end = message.index("next_step") + json_response["plan"] = message[start:end].replace('"', '').strip() + if ("next_step" in message and "terminate" in message): + start = message.index("next_step") + len("next_step") + end = message.index("terminate") + json_response["next_step"] = message[start:end].replace('"', '').strip() + if ("terminate" in message and "final_response" in message): + start = message.index("terminate") + len("terminate") + end = message.index("final_response") + matched_string=message[start:end].replace('"', '').strip() + if ("yes" in matched_string): + json_response["terminate"] = "yes" + else: + json_response["terminate"] = "no" + + start=message.index("final_response") + len("final_response") + end=len(message)-1 + json_response["final_response"] = message[start:end].replace('"', '').strip() + + elif ("terminate" in message): + start = message.index("terminate") + len("terminate") + end = len(message)-1 + matched_string=message[start:end].replace('"', '').strip() + if ("yes" in matched_string): + json_response["terminate"] = "yes" + else: + json_response["terminate"] = "no" + + return json_response diff --git a/ae/utils/ui_messagetype.py b/ae/utils/ui_messagetype.py new file mode 100644 index 0000000..f42d586 --- /dev/null +++ b/ae/utils/ui_messagetype.py @@ -0,0 +1,11 @@ +from enum import Enum + + +# class syntax +class MessageType(Enum): + PLAN = "plan" + STEP = "step" + ACTION ="action" + ANSWER = "answer" + QUESTION= "question" + INFO = "info" diff --git a/pyproject.toml b/pyproject.toml index cf15e49..d6259ce 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -22,7 +22,8 @@ dependencies = [ "pyautogen==0.2.27", "pydantic==2.6.2", "python-dotenv==1.0.0", - "tabulate==0.9.0" + "tabulate==0.9.0", + "nest-asyncio==1.6.0" ] [project.optional-dependencies] diff --git a/requirements.txt b/requirements.txt index d0a6fdf..b513ce0 100644 --- a/requirements.txt +++ b/requirements.txt @@ -91,6 +91,7 @@ idna==3.6 # requests joblib==1.3.2 # via nltk +nest-asyncio==1.6.0 nltk==3.8.1 numpy==1.26.4 # via diff --git a/scripts/aggregate_test_results.py b/scripts/aggregate_test_results.py new file mode 100644 index 0000000..ee576d4 --- /dev/null +++ b/scripts/aggregate_test_results.py @@ -0,0 +1,236 @@ +import argparse +import json +import os +from collections import Counter +from collections import defaultdict +from typing import Any +from typing import List + +import pandas as pd +from pandas.io.formats.style import Styler + +URL_ALIAS_MAP = { + "https://www.allrecipes.com/": "Allrecipes", + "https://www.amazon.com/": "Amazon", + "https://www.apple.com/": "Apple", + "https://arxiv.org/": "Arxiv", + "https://www.bbc.com/news/": "BBC", + "https://www.booking.com/": "Booking", + "https://dictionary.cambridge.org/": "Dictionary", + "https://www.coursera.org/": "Coursera", + "https://www.espn.com/": "ESPN", + "https://github.com/": "GitHub", + "https://www.google.com/travel/flights/": "Flights", + "https://www.google.com/maps/": "Maps", + "https://www.google.com/": "Google", + "https://huggingface.co/": "Hugging Face", + "https://www.wolframalpha.com/": "Wolfram" +} + +def find_and_read_json_files(test_results_dir: str, target_directory_name: str) -> list[dict[str, Any]]: + result_data: list[dict[str, Any]] = [] + + # Walk through the test results directory + for root, _dirs, files in os.walk(test_results_dir): + # Check if the target directory is in the current path + if target_directory_name in root: + # If found, iterate through the files in that directory + for file in files: + if file.endswith('.json'): + file_path = os.path.join(root, file) + # Read the JSON file and append its contents to the result_data list + with open(file_path, 'r') as json_file: + print(f"Reading file: {file_path}") + try: + data = json.load(json_file) + result_data.append(data) + except json.JSONDecodeError as e: + print(f"Error decoding JSON from file {file_path}: {e}") + + return result_data + +def save_to_json_file(data: Any, output_file: str): + with open(output_file, 'w') as json_output_file: + json.dump(data, json_output_file, indent=4) + +def extract_alias(url: str) -> str: + for known_url, alias in URL_ALIAS_MAP.items(): + if url.startswith(known_url): + return alias + return "Unknown" + +def count_scores_by_alias(data: list[dict[str, Any]]): + alias_score_counter = defaultdict(Counter) + overall_score_counter = Counter() + for entry in data: + score = entry.get('score') + start_url = entry.get('start_url') + if score is not None: + overall_score_counter[score] += 1 + if start_url: + alias = extract_alias(start_url) + alias_score_counter[alias][score] += 1 + return alias_score_counter, overall_score_counter + +def calculate_percentages(score_counter: Counter) -> dict[str, float]: + total_count = sum(score_counter.values()) + score_percentages = {score: (count / total_count) * 100 for score, count in score_counter.items()} + return score_percentages, total_count + +def adjust_scores(data: list[dict[str, Any]], task_ids_to_flip: List[int]): + for entry in data: + if entry.get('task_id') in task_ids_to_flip: + if entry.get('score') == 1.0: + entry['score'] = 0.0 + return data + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="Process some JSON files.") + parser.add_argument( + "test_results_dir", + type=str, + help="The base directory containing the test results." + ) + parser.add_argument( + "--target_directory_name", + type=str, + default="results_for_test_results_for_webvoyager_test", + help="The name of the target directory to search within the base directory." + ) + parser.add_argument( + "--output_file", + type=str, + default="compiled_test_results.json", + help="The name of the output file." + ) + parser.add_argument( + "--adjust_task_ids", + type=str, + help="Comma-separated list of task_id values to flip from score 1.0 to 0.0." + ) + + args = parser.parse_args() + + # Derive the full path for the output file + output_file_path = os.path.join(args.test_results_dir, args.output_file) + + # Find and read the JSON files + compiled_data = find_and_read_json_files(args.test_results_dir, args.target_directory_name) + # Sort the compiled data by 'task_index' + sorted_data: list[dict[str, Any]] = sorted(compiled_data, key=lambda x: x.get('task_index', -1)) + + print(f"Number of records found: {len(sorted_data)}") + # Save the compiled data to a JSON file + save_to_json_file(sorted_data, output_file_path) + + # Count the scores by alias and overall + alias_score_counts, overall_score_counts = count_scores_by_alias(sorted_data) + + # Calculate percentages by alias and overall + alias_score_percentages = { + alias: calculate_percentages(score_counter) + for alias, score_counter in alias_score_counts.items() + } + overall_score_percentages, overall_total = calculate_percentages(overall_score_counts) + + # Save the alias score percentages to a JSON file + output_results = { + "overall": { + "percentages": overall_score_percentages, + "counts": dict(overall_score_counts), + "total": overall_total + }, + "by_alias": { + alias: { + "percentages": percentages, + "counts": dict(alias_score_counts[alias]), + "total": total + } + for alias, (percentages, total) in alias_score_percentages.items() + } + } + alias_output_file_path = os.path.join(args.test_results_dir, "alias_score_percentages.json") + save_to_json_file(output_results, alias_output_file_path) + + # Print the overall results to the command line + print("\nOverall Score Percentages and Counts (Pre-adjustment):") + print(f"{'Score':<10}{'Percentage':<15}{'Count':<10}") + for score, percentage in overall_score_percentages.items(): + count = overall_score_counts[score] + print(f"{score:<10}{percentage:.2f}%{count:<10}") + + # Adjust scores based on provided task IDs + if args.adjust_task_ids: + task_ids_to_flip = list(map(int, args.adjust_task_ids.split(','))) + sorted_data = adjust_scores(sorted_data, task_ids_to_flip) + + # Recount the scores by alias and overall after adjustment + alias_score_counts_adjusted, overall_score_counts_adjusted = count_scores_by_alias(sorted_data) + + # Recalculate percentages by alias and overall after adjustment + alias_score_percentages_adjusted = { + alias: calculate_percentages(score_counter) + for alias, score_counter in alias_score_counts_adjusted.items() + } + overall_score_percentages_adjusted, overall_total_adjusted = calculate_percentages(overall_score_counts_adjusted) + + # Save the adjusted alias score percentages to a JSON file + adjusted_output_results = { + "overall": { + "percentages": overall_score_percentages_adjusted, + "counts": dict(overall_score_counts_adjusted), + "total": overall_total_adjusted + }, + "by_alias": { + alias: { + "percentages": percentages, + "counts": dict(alias_score_counts_adjusted[alias]), + "total": total + } + for alias, (percentages, total) in alias_score_percentages_adjusted.items() + } + } + adjusted_alias_output_file_path = os.path.join(args.test_results_dir, "adjusted_alias_score_percentages.json") + save_to_json_file(adjusted_output_results, adjusted_alias_output_file_path) + + # Print the overall results to the command line post adjustment + print("\nOverall Score Percentages and Counts (Post-adjustment):") + print(f"{'Score':<10}{'Percentage':<15}{'Count':<10}") + for score, percentage in overall_score_percentages_adjusted.items(): + count = overall_score_counts_adjusted[score] + print(f"{score:<10}{percentage:.2f}%{count:<10}") + + # Prepare data for DataFrame post adjustment + data = [] + for score in sorted(set(overall_score_counts_adjusted.keys()).union(*[alias_score_counts_adjusted[alias].keys() for alias in alias_score_counts_adjusted])): + row = {"Score": score} + row["Overall"] = f"{overall_score_percentages_adjusted.get(score, 0):.2f}% ({overall_score_counts_adjusted.get(score, 0)})" + for alias in sorted(URL_ALIAS_MAP.values()): + percentages, _ = alias_score_percentages_adjusted.get(alias, ({}, 0)) + counts = alias_score_counts_adjusted.get(alias, {}) + row[alias] = f"{percentages.get(score, 0):.2f}% ({counts.get(score, 0)})" + data.append(row) + + # Create DataFrame + df = pd.DataFrame(data) + + # Styling the DataFrame + styled_df = df.style.set_table_styles( + [ + {'selector': 'thead th', 'props': 'font-weight: bold; text-align: center;'}, + {'selector': 'th', 'props': 'text-align: center;'}, + {'selector': 'td', 'props': 'text-align: center;'}, + {'selector': 'table', 'props': 'border-collapse: collapse; width: 100%;'}, + {'selector': 'table, th, td', 'props': 'border: 1px solid black;'} + ] + ).set_caption("Benchmark Report") + + # Save to HTML with styled format + html_output_file = os.path.join(args.test_results_dir, "benchmark_report.html") + styled_df.to_html(html_output_file) + + print(f"\nBenchmark report has been saved to: {html_output_file}") + + +# Sample how to run: +# python scripts/aggregate_test_results.py /path/to/folder/agent_e_annotators_tests/round2 --adjust_task_ids "14, 26, 51, 63, 93, 141" \ No newline at end of file diff --git a/test/tasks/annotator_dry_run_webvoyager_tasks_30.json b/test/tasks/annotator_dry_run_webvoyager_tasks_30.json new file mode 100644 index 0000000..e4a5b17 --- /dev/null +++ b/test/tasks/annotator_dry_run_webvoyager_tasks_30.json @@ -0,0 +1,812 @@ +[ + { + "sites": null, + "task_id": 15, + "require_login": false, + "storage_state": null, + "start_url": "https://www.allrecipes.com/", + "geolocation": null, + "intent_template": "Choose a dessert recipe on Allrecipes with a prep time of less than 30 minutes, has chocolate as an ingredient, and has a user rating of 4 stars or higher. Provide the name of the recipe, ingredients list, and step-by-step instructions.", + "instantiation_dict": {}, + "intent": "Choose a dessert recipe on Allrecipes with a prep time of less than 30 minutes, has chocolate as an ingredient, and has a user rating of 4 stars or higher. Provide the name of the recipe, ingredients list, and step-by-step instructions.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "'Ultimate Chocolate Dessert', 4.7-star, prep time 15 mins", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Allrecipes--15", + "task_index": 0 + }, + { + "sites": null, + "task_id": 29, + "require_login": false, + "storage_state": null, + "start_url": "https://www.allrecipes.com/", + "geolocation": null, + "intent_template": "Search for a Mediterranean-style grilled fish recipe on Allrecipes that includes ingredients like olives, has at least a 4-star rating, and more than 25 reviews. Detail the ingredients, cooking method, and total time required for preparation and cooking.", + "instantiation_dict": {}, + "intent": "Search for a Mediterranean-style grilled fish recipe on Allrecipes that includes ingredients like olives, has at least a 4-star rating, and more than 25 reviews. Detail the ingredients, cooking method, and total time required for preparation and cooking.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "'Branzino Mediterranean', 36 reviews, <Ingredients> include olive oil, <cooking method>, Prep Time: 15 mins, Cook Time: 25 mins, Total Time: 40 mins", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Allrecipes--29", + "task_index": 1 + }, + { + "sites": null, + "task_id": 72, + "require_login": false, + "storage_state": null, + "start_url": "https://www.amazon.com/", + "geolocation": null, + "intent_template": "Look for a USB-C hub on Amazon compatible with MacBook Pro, featuring at least 4 ports, including HDMI and SD card reader. The price should be under $50. Select the one after sorting by Best Sellers.", + "instantiation_dict": {}, + "intent": "Look for a USB-C hub on Amazon compatible with MacBook Pro, featuring at least 4 ports, including HDMI and SD card reader. The price should be under $50. Select the one after sorting by Best Sellers.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Hiearcool USB C Hub, USB C Multi-Port Adapter for MacBook Pro, 7IN1, include 4K HDMI USB3.0 and SD/TF Card Reader, $24.99", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Amazon--27", + "task_index": 2 + }, + { + "sites": null, + "task_id": 85, + "require_login": false, + "storage_state": null, + "start_url": "https://www.amazon.com/", + "geolocation": null, + "intent_template": "Locate a women's yoga mat in purple, with a thickness of at least 5mm, rated 4+ stars, and priced under $30 on Amazon. Check how many colors are available in total, and what is the return and delivery policy.", + "instantiation_dict": {}, + "intent": "Locate a women's yoga mat in purple, with a thickness of at least 5mm, rated 4+ stars, and priced under $30 on Amazon. Check how many colors are available in total, and what is the return and delivery policy.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "ProsourceFit Extra Thick Yoga Pilates Exercise Mat, 1/2\", 4.6 stars, $21.99, 7 colors, FREE delivery Friday, March 1 on orders shipped by Amazon over $35", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Amazon--40", + "task_index": 3 + }, + { + "sites": null, + "task_id": 97, + "require_login": false, + "storage_state": null, + "start_url": "https://www.apple.com/", + "geolocation": null, + "intent_template": "Get information about the latest iPad model released by Apple, including its release date, base storage capacity, and starting price available on Apple's official website.", + "instantiation_dict": {}, + "intent": "Get information about the latest iPad model released by Apple, including its release date, base storage capacity, and starting price available on Apple's official website.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "sixth-generation iPad Pro 11\u2011inch, iPad Pro 12.9\u2011inch; release date: October 26, 2022; base storage capacity 128 GB, starting price $799", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Apple--11", + "task_index": 4 + }, + { + "sites": null, + "task_id": 100, + "require_login": false, + "storage_state": null, + "start_url": "https://www.apple.com/", + "geolocation": null, + "intent_template": "Identify the upgrade options available for the cheapest base model of the MacBook Pro 14-inch with M3 chip, and calculate the total price difference from the base model to the maximum upgrade (no Pre-Installed Software) offered by Apple.", + "instantiation_dict": {}, + "intent": "Identify the upgrade options available for the cheapest base model of the MacBook Pro 14-inch with M3 chip, and calculate the total price difference from the base model to the maximum upgrade (no Pre-Installed Software) offered by Apple.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Base model:$1599, difference: $1020", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Apple--14", + "task_index": 5 + }, + { + "sites": null, + "task_id": 168, + "require_login": false, + "storage_state": null, + "start_url": "https://arxiv.org/", + "geolocation": null, + "intent_template": "Search the title 'GPT-4 Technical Report' and access this paper through HTML format. Read the paper on this page and tell me what is 'one of the main goals of developing such models' mentioned in the Introduction.", + "instantiation_dict": {}, + "intent": "Search the title 'GPT-4 Technical Report' and access this paper through HTML format. Read the paper on this page and tell me what is 'one of the main goals of developing such models' mentioned in the Introduction.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "One of the main goals of developing such models is to improve their ability to understand and generate natural language text, particularly in more complex and nuanced scenarios.", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "ArXiv--39", + "task_index": 6 + }, + { + "sites": null, + "task_id": 169, + "require_login": false, + "storage_state": null, + "start_url": "https://arxiv.org/", + "geolocation": null, + "intent_template": "How many articles are there on each of the three most recent announce days in the Solar and Stellar Astrophysics section of ArXiv. Choose one at random and answer its title and when the first version was uploaded?", + "instantiation_dict": {}, + "intent": "How many articles are there on each of the three most recent announce days in the Solar and Stellar Astrophysics section of ArXiv. Choose one at random and answer its title and when the first version was uploaded?", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "astro-ph.SR paper, latest 3 days", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "ArXiv--40", + "task_index": 7 + }, + { + "sites": null, + "task_id": 186, + "require_login": false, + "storage_state": null, + "start_url": "https://www.bbc.com/news/", + "geolocation": null, + "intent_template": "Find a news article on BBC News about the impact of the recent tech industry layoffs on the global economy. Summarize the key points and the name of the author, and provide the date of publication.", + "instantiation_dict": {}, + "intent": "Find a news article on BBC News about the impact of the recent tech industry layoffs on the global economy. Summarize the key points and the name of the author, and provide the date of publication.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "<title>, <author>, <summary> (impact of the recent tech industry layoffs on the global economy)", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "BBC News--14", + "task_index": 8 + }, + { + "sites": null, + "task_id": 213, + "require_login": false, + "storage_state": null, + "start_url": "https://www.bbc.com/news/", + "geolocation": null, + "intent_template": "Find Golf in BBC News, check the Leaderboard at this point in Women's Majors and count which country has the most players in the top 20? Which player has the best score amongst the Australian players and in what place.", + "instantiation_dict": {}, + "intent": "Find Golf in BBC News, check the Leaderboard at this point in Women's Majors and count which country has the most players in the top 20? Which player has the best score amongst the Australian players and in what place.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Sport - Golf - Leaderboard - Women's Majors, most in top20: American, best in Australian: Grace Kim in 36", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "BBC News--41", + "task_index": 9 + }, + { + "sites": null, + "task_id": 230, + "require_login": false, + "storage_state": null, + "start_url": "https://www.booking.com/", + "geolocation": null, + "intent_template": "Look for a hotel in Paris with a user rating of 9 or higher and available for a 5-night stay starting January 15, 2024. The hotel should also offer free Wi-Fi and breakfast included in the price. Provide the name, location, and price per night.", + "instantiation_dict": {}, + "intent": "Look for a hotel in Paris with a user rating of 9 or higher and available for a 5-night stay starting January 15, 2024. The hotel should also offer free Wi-Fi and breakfast included in the price. Provide the name, location, and price per night.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Zoku Paris; 48 Avenue de la Porte de Clichy, 17th arr., Paris; US$210 per night", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Booking--16", + "task_index": 10 + }, + { + "sites": null, + "task_id": 232, + "require_login": false, + "storage_state": null, + "start_url": "https://www.booking.com/", + "geolocation": null, + "intent_template": "Search a hotel in London with a user rating of 8 or higher for a stay between February 14th, 2024, and February 21st, 2024, suitable for a couple. Provide the name and a short description of the hotel.", + "instantiation_dict": {}, + "intent": "Search a hotel in London with a user rating of 8 or higher for a stay between February 14th, 2024, and February 21st, 2024, suitable for a couple. Provide the name and a short description of the hotel.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Cromwell Serviced Apartments; Cromwell Serviced Apartments is an apartment featuring rooms with free Wifi and air conditioning in the center of London", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Booking--18", + "task_index": 11 + }, + { + "sites": null, + "task_id": 262, + "require_login": false, + "storage_state": null, + "start_url": "https://dictionary.cambridge.org/", + "geolocation": null, + "intent_template": "Look for the British English pronunciation of the word \"innovate\" and write down the International Phonetic Alphabet (IPA) notation, then find one example sentence provided in the Cambridge Dictionary that uses this word.", + "instantiation_dict": {}, + "intent": "Look for the British English pronunciation of the word \"innovate\" and write down the International Phonetic Alphabet (IPA) notation, then find one example sentence provided in the Cambridge Dictionary that uses this word.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "UK: /\u02c8\u026an.\u0259.ve\u026at/; Above all, this proposal aims to correct the allocative inefficiencies of the existing patent system, while preserving the dynamic incentives to innovate.", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Cambridge Dictionary--4", + "task_index": 12 + }, + { + "sites": null, + "task_id": 281, + "require_login": false, + "storage_state": null, + "start_url": "https://dictionary.cambridge.org/", + "geolocation": null, + "intent_template": "Find the US English pronunciation of the word \"meticulous\" using the Cambridge Dictionary and note the International Phonetic Alphabet (IPA) notation, then find one example sentence provided in the dictionary using this word.", + "instantiation_dict": {}, + "intent": "Find the US English pronunciation of the word \"meticulous\" using the Cambridge Dictionary and note the International Phonetic Alphabet (IPA) notation, then find one example sentence provided in the dictionary using this word.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "US: /m\u0259\u02c8t\u026ak.j\u0259.l\u0259s/; Many hours of meticulous preparation have gone into writing the book.", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Cambridge Dictionary--23", + "task_index": 13 + }, + { + "sites": null, + "task_id": 325, + "require_login": false, + "storage_state": null, + "start_url": "https://www.coursera.org/", + "geolocation": null, + "intent_template": "Locate an online course on Coursera related to 'Sustainability' that belongs to Physical Science and Engineering subject. The course should include a module on Measuring Sustainability. Note the course duration and the offering institution.", + "instantiation_dict": {}, + "intent": "Locate an online course on Coursera related to 'Sustainability' that belongs to Physical Science and Engineering subject. The course should include a module on Measuring Sustainability. Note the course duration and the offering institution.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Introduction to Sustainability; University of Illinois at Urbana-Champaign; Instructors: Dr. Jonathan Tomkin; duration: Approx. 25 hours to complete, 3 weeks at 8 hours a week", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Coursera--24", + "task_index": 14 + }, + { + "sites": null, + "task_id": 327, + "require_login": false, + "storage_state": null, + "start_url": "https://www.coursera.org/", + "geolocation": null, + "intent_template": "Identify a Specialization on Coursera that offers an overview of 'Renewable Energy'. The Specialization should be beginner-level and include a course on Renewable Energy Futures. Note the instructor's name and the number of weeks required to complete the course if I spend 5 hours a week.", + "instantiation_dict": {}, + "intent": "Identify a Specialization on Coursera that offers an overview of 'Renewable Energy'. The Specialization should be beginner-level and include a course on Renewable Energy Futures. Note the instructor's name and the number of weeks required to complete the course if I spend 5 hours a week.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Renewable Energy Specialization; Instructors: Stephen R. Lawrence, Paul Komor; 2 months", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Coursera--26", + "task_index": 15 + }, + { + "sites": null, + "task_id": 373, + "require_login": false, + "storage_state": null, + "start_url": "https://www.espn.com/", + "geolocation": null, + "intent_template": "Check out the NHL Standings 2023-24 on ESPN to see which teams are at the top and which are at the bottom in Eastern and Western Conference. What about the situation in Division.", + "instantiation_dict": {}, + "intent": "Check out the NHL Standings 2023-24 on ESPN to see which teams are at the top and which are at the bottom in Eastern and Western Conference. What about the situation in Division.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "NHL Standings 2023-24, top - bottom, Eastern Conference: New York Rangers - Columbus Blue Jackets; Western Conference: Vancouver Canucks - Chicago Blackhawks; Division: ATLANTIC, Boston Bruins - Montreal Canadiens; METROPOLITAN: New York Rangers - Columbus Blue Jackets; CENTRAL: Dallas Stars - Chicago Blackhawks; PACIFIC: Vancouver Canucks - San Jose Sharks", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "ESPN--30", + "task_index": 16 + }, + { + "sites": null, + "task_id": 381, + "require_login": false, + "storage_state": null, + "start_url": "https://www.espn.com/", + "geolocation": null, + "intent_template": "Check Los Angeles Lakers Stats 2023-24, calculate Anthony Davis' games played (GP) percentage, tell me if there are other players with the same games played percentage as Anthony Davis.", + "instantiation_dict": {}, + "intent": "Check Los Angeles Lakers Stats 2023-24, calculate Anthony Davis' games played (GP) percentage, tell me if there are other players with the same games played percentage as Anthony Davis.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "54/58 = 93.1%, no other players, https://www.espn.com/nba/team/stats/_/name/lal/los-angeles-lakers", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "ESPN--38", + "task_index": 17 + }, + { + "sites": null, + "task_id": 398, + "require_login": false, + "storage_state": null, + "start_url": "https://github.com/", + "geolocation": null, + "intent_template": "Find a newly created open-source project on GitHub related to 'climate change' that has been initiated in January 2023; check the main programming language used and the project's description.", + "instantiation_dict": {}, + "intent": "Find a newly created open-source project on GitHub related to 'climate change' that has been initiated in January 2023; check the main programming language used and the project's description.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "TheAIDojo/AI-for-Climate-Change; Jupyter Notebook; Repository of notebooks and associated code that covers the fundamental concepts of deep learning and its application to climate science.", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "GitHub--11", + "task_index": 18 + }, + { + "sites": null, + "task_id": 402, + "require_login": false, + "storage_state": null, + "start_url": "https://github.com/", + "geolocation": null, + "intent_template": "Locate a repository on GitHub related to 'quantum computing' that has been updated within the last week and has at least 50 stars. Provide a brief description of the project.", + "instantiation_dict": {}, + "intent": "Locate a repository on GitHub related to 'quantum computing' that has been updated within the last week and has at least 50 stars. Provide a brief description of the project.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "desireevl/awesome-quantum-computing", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "GitHub--15", + "task_index": 19 + }, + { + "sites": null, + "task_id": 462, + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/travel/flights/", + "geolocation": null, + "intent_template": "Compare business class flight options from Lisbon to Singapore for a one-way trip on March 15, 2024, select one of the flights and see which websites offer its booking options. Which one is the cheapest.", + "instantiation_dict": {}, + "intent": "Compare business class flight options from Lisbon to Singapore for a one-way trip on March 15, 2024, select one of the flights and see which websites offer its booking options. Which one is the cheapest.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Emirates, 8:45\u202fPM \u2013 9:15\u202fPM(+1), booking options: Emirates, Gotogate, Martigo, Expedia, kiss&fly, eDreams ... cheapest: Gotogate", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Google Flights--34", + "task_index": 20 + }, + { + "sites": null, + "task_id": 465, + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/travel/flights/", + "geolocation": null, + "intent_template": "Locate a round-trip flight from Buenos Aires to Beijing, leaving on February 28, 2024, and returning on March 3, 2024, check out one of the options and tell me if the airline for my return flight is the same as my departure flight.", + "instantiation_dict": {}, + "intent": "Locate a round-trip flight from Buenos Aires to Beijing, leaving on February 28, 2024, and returning on March 3, 2024, check out one of the options and tell me if the airline for my return flight is the same as my departure flight.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Lufthansa, 5:50\u202fPM \u2013 9:30\u202fAM(+2), return flight can be Lufthansa, 11:20\u202fAM \u2013 7:55\u202fAM(+1), the same as departure flight", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Google Flights--37", + "task_index": 21 + }, + { + "sites": null, + "task_id": 489, + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/maps/", + "geolocation": null, + "intent_template": "I will arrive Pittsburgh Airport soon. Provide the name of the Hilton hotel closest to the airport. Then, tell me the the walking time to the nearest supermarket from the hotel.", + "instantiation_dict": {}, + "intent": "I will arrive Pittsburgh Airport soon. Provide the name of the Hilton hotel closest to the airport. Then, tell me the the walking time to the nearest supermarket from the hotel.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Hilton Garden Inn Pittsburgh Airport, walking time around 15min - 30min", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Google Map--19", + "task_index": 22 + }, + { + "sites": null, + "task_id": 503, + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/maps/", + "geolocation": null, + "intent_template": "Check out Denver International Airport's information and tell me: 1) which level has the least proportion in reviews; 2) what are its Accessibility and Amenities.", + "instantiation_dict": {}, + "intent": "Check out Denver International Airport's information and tell me: 1) which level has the least proportion in reviews; 2) what are its Accessibility and Amenities.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "star 2 has the least proportion; Accessibility: Assistive hearing loop; Wheelchair accessible entrance; Wheelchair accessible parking lot; Wheelchair accessible restroom; Wheelchair accessible seating; Amenities: Baggage storage; Wi-Fi; Free Wi-Fi", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Google Map--33", + "task_index": 23 + }, + { + "sites": null, + "task_id": 519, + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/", + "geolocation": null, + "intent_template": "Find the video on YouTube: 'Oscars 2023: Must-See Moments!'. Tell me who the first comment displayed under that video belongs to, and how many thumbs up and replies it has.", + "instantiation_dict": {}, + "intent": "Find the video on YouTube: 'Oscars 2023: Must-See Moments!'. Tell me who the first comment displayed under that video belongs to, and how many thumbs up and replies it has.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "user: @melvinsmiley5295, 329 thumbs up and 2 replies (real-time)", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Google Search--8", + "task_index": 24 + }, + { + "sites": null, + "task_id": 544, + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/", + "geolocation": null, + "intent_template": "Find and copy the SHA of the latest commit in the TensorFlow repository on GitHub, then find a textbox to paste and tell me what the SHA is.", + "instantiation_dict": {}, + "intent": "Find and copy the SHA of the latest commit in the TensorFlow repository on GitHub, then find a textbox to paste and tell me what the SHA is.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "<SHA> of latest Tensorflow", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Google Search--33", + "task_index": 25 + }, + { + "sites": null, + "task_id": 560, + "require_login": false, + "storage_state": null, + "start_url": "https://huggingface.co/", + "geolocation": null, + "intent_template": "Find the model sentence-transformers/all-MiniLM-L6-v2 and use the Inference API on the webpage to get the similarity of the following two sentences: 'Tomorrow is Sunday', 'Eat a burger on Sunday'.", + "instantiation_dict": {}, + "intent": "Find the model sentence-transformers/all-MiniLM-L6-v2 and use the Inference API on the webpage to get the similarity of the following two sentences: 'Tomorrow is Sunday', 'Eat a burger on Sunday'.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "0.550", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Huggingface--6", + "task_index": 26 + }, + { + "sites": null, + "task_id": 571, + "require_login": false, + "storage_state": null, + "start_url": "https://huggingface.co/", + "geolocation": null, + "intent_template": "Find the most recently updated open-source project related to natural language processing on the Huggingface platform. Provide the project's name, creator, and a brief description of its functionality.", + "instantiation_dict": {}, + "intent": "Find the most recently updated open-source project related to natural language processing on the Huggingface platform. Provide the project's name, creator, and a brief description of its functionality.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "<model>; <creator>; <description> (recent, NLP)", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Huggingface--17", + "task_index": 27 + }, + { + "sites": null, + "task_id": 604, + "require_login": false, + "storage_state": null, + "start_url": "https://www.wolframalpha.com/", + "geolocation": null, + "intent_template": "Give the final angle and final length after 6s of a Spring pendulum with spring equilibrium length=0.12m, initial length=0.24m, initial angle=80deg, mass=1kg, spring constant=120 N/m .", + "instantiation_dict": {}, + "intent": "Give the final angle and final length after 6s of a Spring pendulum with spring equilibrium length=0.12m, initial length=0.24m, initial angle=80deg, mass=1kg, spring constant=120 N/m .", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "-73.26\u00b0 from vertical; 0.252 m", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Wolfram Alpha--7", + "task_index": 28 + }, + { + "sites": null, + "task_id": 640, + "require_login": false, + "storage_state": null, + "start_url": "https://www.wolframalpha.com/", + "geolocation": null, + "intent_template": "A polyominoes of order 6 means you have 6 identical squares to combine different shapes (2-sided). How many combinations are there? Looking at all the shapes in the result, how many of them have only 2 rows in total?", + "instantiation_dict": {}, + "intent": "A polyominoes of order 6 means you have 6 identical squares to combine different shapes (2-sided). How many combinations are there? Looking at all the shapes in the result, how many of them have only 2 rows in total?", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "35; 12", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Wolfram Alpha--43", + "task_index": 29 + } +] \ No newline at end of file diff --git a/test/tasks/webvoyager_sampled_data.json b/test/tasks/webvoyager_sampled_data.json new file mode 100644 index 0000000..ed960e4 --- /dev/null +++ b/test/tasks/webvoyager_sampled_data.json @@ -0,0 +1,1627 @@ +[ + { + "sites": null, + "task_id": "Allrecipes--0", + "require_login": false, + "storage_state": null, + "start_url": "https://www.allrecipes.com/", + "geolocation": null, + "intent_template": "Provide a recipe for vegetarian lasagna with more than 100 reviews and a rating of at least 4.5 stars suitable for 6 people.", + "instantiation_dict": {}, + "intent": "Provide a recipe for vegetarian lasagna with more than 100 reviews and a rating of at least 4.5 stars suitable for 6 people.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "'Vegetarian Four Cheese Lasagna', 4.6-star, 181 reviews, Servings 8", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Allrecipes--10", + "require_login": false, + "storage_state": null, + "start_url": "https://www.allrecipes.com/", + "geolocation": null, + "intent_template": "Find The Most Popular Recipes of the 1960s, noting the recipe name, preparation time and total time of the second recipe in this collection.", + "instantiation_dict": {}, + "intent": "Find The Most Popular Recipes of the 1960s, noting the recipe name, preparation time and total time of the second recipe in this collection.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "'Swedish Meatballs I', prep time 25 mins, total time 1 hour 25 mins", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Allrecipes--20", + "require_login": false, + "storage_state": null, + "start_url": "https://www.allrecipes.com/", + "geolocation": null, + "intent_template": "Find a recipe for a cauliflower pizza crust that has a preparation time of under 30 minutes and a rating of at least 4 stars on Allrecipes. Include the number of calories per serving.", + "instantiation_dict": {}, + "intent": "Find a recipe for a cauliflower pizza crust that has a preparation time of under 30 minutes and a rating of at least 4 stars on Allrecipes. Include the number of calories per serving.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "'Cauliflower Pizza Crust', 4.2 stars, Prep Time: 15 mins, 59 Calories per serving", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Allrecipes--30", + "require_login": false, + "storage_state": null, + "start_url": "https://www.allrecipes.com/", + "geolocation": null, + "intent_template": "Find a recipe for a vegan smoothie bowl on Allrecipes that includes bananas and leaves, has more than 20 reviews, and a rating of at least 4 stars. Provide a list of ingredients, preparation time, and a summary of the recipe steps.", + "instantiation_dict": {}, + "intent": "Find a recipe for a vegan smoothie bowl on Allrecipes that includes bananas and leaves, has more than 20 reviews, and a rating of at least 4 stars. Provide a list of ingredients, preparation time, and a summary of the recipe steps.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "'Spinach and Banana Power Smoothie', 4.8 stars, 72 reviews, Ingredients: 1 cup plain soy milk, 3/4 cup packed fresh spinach leaves, 1 large banana, sliced; Prep Time: 10 mins; <steps>", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Allrecipes--40", + "require_login": false, + "storage_state": null, + "start_url": "https://www.allrecipes.com/", + "geolocation": null, + "intent_template": "Browse the about us section of Allrecipes for a brief introduction to The Allrecipes Allstars.", + "instantiation_dict": {}, + "intent": "Browse the about us section of Allrecipes for a brief introduction to The Allrecipes Allstars.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "The Allrecipes Allstars: Social media influencers, registered dietitians, grillmasters, and more seasoned home cooks make up our enthusiastic squad of 100+ brand ambassadors. This diverse, food-loving crew spans the U.S. geographically and represents many different cultures, ethnicities, and family makeups. Since 2011, the Allstars have created tens of thousands of original recipes, photos, and reviews plus shared their cooking expertise via flat and video content on our website, social media, plus more marketing channels.", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Amazon--5", + "require_login": false, + "storage_state": null, + "start_url": "https://www.amazon.com/", + "geolocation": null, + "intent_template": "Find a Blue iPhone 12 Pro 128gb and add to cart.", + "instantiation_dict": {}, + "intent": "Find a Blue iPhone 12 Pro 128gb and add to cart.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Apple iPhone 12 Pro, 128GB, Pacific Blue - Fully Unlocked (Renewed); Action: ADD_TO_CHART", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Amazon--15", + "require_login": false, + "storage_state": null, + "start_url": "https://www.amazon.com/", + "geolocation": null, + "intent_template": "Find a pair of mens running shoes in black, size 7, 4+ stars and under $50 and add them to my cart on Amazon.", + "instantiation_dict": {}, + "intent": "Find a pair of mens running shoes in black, size 7, 4+ stars and under $50 and add them to my cart on Amazon.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Damyuan Men's Sport Gym Running Shoes Walking Shoes Casual Lace Up Lightweight; black, size 7, 4.0-star, $29.99", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Amazon--25", + "require_login": false, + "storage_state": null, + "start_url": "https://www.amazon.com/", + "geolocation": null, + "intent_template": "Search for a queen-sized, hypoallergenic mattress topper on Amazon. It should have a memory foam material and be priced between $50 to $100.", + "instantiation_dict": {}, + "intent": "Search for a queen-sized, hypoallergenic mattress topper on Amazon. It should have a memory foam material and be priced between $50 to $100.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "2 Inch 7-Zone Memory Foam Mattress Topper Queen with 100% Bamboo Rayon Cover, Cooling Gel-Infused Swirl Egg Crate Memory Foam, $99.99", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Amazon--35", + "require_login": false, + "storage_state": null, + "start_url": "https://www.amazon.com/", + "geolocation": null, + "intent_template": "Find a men's leather wallet on Amazon with RFID blocking, at least 6 card slots, and priced below $50. Check if it's available for FREE delivery.", + "instantiation_dict": {}, + "intent": "Find a men's leather wallet on Amazon with RFID blocking, at least 6 card slots, and priced below $50. Check if it's available for FREE delivery.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "STAY FINE Top Grain Leather Wallet for Men, RFID Blocking, Slim Billfold with 8 Card Slots, FREE delivery Friday, March 1", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Apple--4", + "require_login": false, + "storage_state": null, + "start_url": "https://www.apple.com/", + "geolocation": null, + "intent_template": "How much does it cost to buy a Macbook pro, 16-inch, Apple M3 Max chip with 16-core CPU, 40-core GPU, 64GB unified memory, 1TB SSD.", + "instantiation_dict": {}, + "intent": "How much does it cost to buy a Macbook pro, 16-inch, Apple M3 Max chip with 16-core CPU, 40-core GPU, 64GB unified memory, 1TB SSD.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "$4,199.00 or $349.91/mo.per month for 12 mo.*", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Apple--14", + "require_login": false, + "storage_state": null, + "start_url": "https://www.apple.com/", + "geolocation": null, + "intent_template": "Identify the upgrade options available for the cheapest base model of the MacBook Pro 14-inch with M3 chip, and calculate the total price difference from the base model to the maximum upgrade (no Pre-Installed Software) offered by Apple.", + "instantiation_dict": {}, + "intent": "Identify the upgrade options available for the cheapest base model of the MacBook Pro 14-inch with M3 chip, and calculate the total price difference from the base model to the maximum upgrade (no Pre-Installed Software) offered by Apple.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Base model:$1599, difference: $1020", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Apple--24", + "require_login": false, + "storage_state": null, + "start_url": "https://www.apple.com/", + "geolocation": null, + "intent_template": "Find out the starting price for the most recent model of the iMac on the Apple website.", + "instantiation_dict": {}, + "intent": "Find out the starting price for the most recent model of the iMac on the Apple website.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "$1299.00", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Apple--34", + "require_login": false, + "storage_state": null, + "start_url": "https://www.apple.com/", + "geolocation": null, + "intent_template": "Identify the size and weight for the Apple TV 4K and list the Siri Remote features introduced.", + "instantiation_dict": {}, + "intent": "Identify the size and weight for the Apple TV 4K and list the Siri Remote features introduced.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Height: 1.2 inches (31 mm), Width: 3.66 inches (93 mm), Depth: 3.66 inches (93 mm); Siri Remote features", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "ArXiv--1", + "require_login": false, + "storage_state": null, + "start_url": "https://arxiv.org/", + "geolocation": null, + "intent_template": "Search for the latest research papers on quantum computing submitted to ArXiv in the last 2 days.", + "instantiation_dict": {}, + "intent": "Search for the latest research papers on quantum computing submitted to ArXiv in the last 2 days.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Paper related to quantum computing (latest 2 days)", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "ArXiv--11", + "require_login": false, + "storage_state": null, + "start_url": "https://arxiv.org/", + "geolocation": null, + "intent_template": "For Non-English submissions, do I need to provide a multi-language abstract, if need, answer the separator between the multiple abstracts.", + "instantiation_dict": {}, + "intent": "For Non-English submissions, do I need to provide a multi-language abstract, if need, answer the separator between the multiple abstracts.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "-----", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "ArXiv--21", + "require_login": false, + "storage_state": null, + "start_url": "https://arxiv.org/", + "geolocation": null, + "intent_template": "Search for papers on 'neural networks for image processing' in the Computer Science category on ArXiv and report how many were submitted in the last week.", + "instantiation_dict": {}, + "intent": "Search for papers on 'neural networks for image processing' in the Computer Science category on ArXiv and report how many were submitted in the last week.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "cs paper related to 'neural networks for image processing',", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "ArXiv--31", + "require_login": false, + "storage_state": null, + "start_url": "https://arxiv.org/", + "geolocation": null, + "intent_template": "Search ArXiv for papers with 'Graph Neural Networks' in the abstract that were submitted between Jan 1, 2024, and Jan 3, 2024, and determine how many of these papers have more than five authors.", + "instantiation_dict": {}, + "intent": "Search ArXiv for papers with 'Graph Neural Networks' in the abstract that were submitted between Jan 1, 2024, and Jan 3, 2024, and determine how many of these papers have more than five authors.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "7 papers", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "ArXiv--41", + "require_login": false, + "storage_state": null, + "start_url": "https://arxiv.org/", + "geolocation": null, + "intent_template": "Find the button to share arxiv non-profit store and follow the QR code to share the shop. Then add arXiv Forever short sleeve (XL) to your cart.", + "instantiation_dict": {}, + "intent": "Find the button to share arxiv non-profit store and follow the QR code to share the shop. Then add arXiv Forever short sleeve (XL) to your cart.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "QR code image, Action: add to chart", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "BBC News--8", + "require_login": false, + "storage_state": null, + "start_url": "https://www.bbc.com/news/", + "geolocation": null, + "intent_template": "Get a brief overview of the economic implications of the UK's latest trade deal posted on BBC News and the date when the article was published.", + "instantiation_dict": {}, + "intent": "Get a brief overview of the economic implications of the UK's latest trade deal posted on BBC News and the date when the article was published.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "CPTPP trade deal, <summary>; 16th July 2023", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "BBC News--18", + "require_login": false, + "storage_state": null, + "start_url": "https://www.bbc.com/news/", + "geolocation": null, + "intent_template": "Visit BBC News Audio, What are the best PodCasts for 2023? List 2 of them.", + "instantiation_dict": {}, + "intent": "Visit BBC News Audio, What are the best PodCasts for 2023? List 2 of them.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "2 of them: Believe in Magic, The Gift, Vishal, A Very British Cult, People Who Knew Me, History's Secret Heroes", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "BBC News--28", + "require_login": false, + "storage_state": null, + "start_url": "https://www.bbc.com/news/", + "geolocation": null, + "intent_template": "Find the Market Data section on BBC News and tell me which company the data comes from.", + "instantiation_dict": {}, + "intent": "Find the Market Data section on BBC News and tell me which company the data comes from.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Business - Market Data, Source: Morningstar", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "BBC News--38", + "require_login": false, + "storage_state": null, + "start_url": "https://www.bbc.com/news/", + "geolocation": null, + "intent_template": "Find news related to the storm in Weather section and indicate where and when the severe weather occurred.", + "instantiation_dict": {}, + "intent": "Find news related to the storm in Weather section and indicate where and when the severe weather occurred.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Earth - Weather & Science, article about severe weather, eg, You can't hear it, but this sound can reveal that a tornado is on its way", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Booking--6", + "require_login": false, + "storage_state": null, + "start_url": "https://www.booking.com/", + "geolocation": null, + "intent_template": "Book one room which provides breakfast, and airport shuttle from Jan 22 to 25 in Los Angeles.", + "instantiation_dict": {}, + "intent": "Book one room which provides breakfast, and airport shuttle from Jan 22 to 25 in Los Angeles.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "La Quinta by Wyndham LAX", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Booking--16", + "require_login": false, + "storage_state": null, + "start_url": "https://www.booking.com/", + "geolocation": null, + "intent_template": "Look for a hotel in Paris with a user rating of 9 or higher and available for a 5-night stay starting January 15, 2024. The hotel should also offer free Wi-Fi and breakfast included in the price. Provide the name, location, and price per night.", + "instantiation_dict": {}, + "intent": "Look for a hotel in Paris with a user rating of 9 or higher and available for a 5-night stay starting January 15, 2024. The hotel should also offer free Wi-Fi and breakfast included in the price. Provide the name, location, and price per night.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Zoku Paris; 48 Avenue de la Porte de Clichy, 17th arr., Paris; US$210 per night", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Booking--26", + "require_login": false, + "storage_state": null, + "start_url": "https://www.booking.com/", + "geolocation": null, + "intent_template": "Check Booking.com for a 3-star hotel or higher in Paris with a guest rating above 8.0 and available parking for dates February 20-23, 2024.", + "instantiation_dict": {}, + "intent": "Check Booking.com for a 3-star hotel or higher in Paris with a guest rating above 8.0 and available parking for dates February 20-23, 2024.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "InterContinental Paris Le Grand, an IHG Hotel, US$2208, 8.6 ratings, 5-star, parking", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Booking--36", + "require_login": false, + "storage_state": null, + "start_url": "https://www.booking.com/", + "geolocation": null, + "intent_template": "Search for a budget hotel in Rome under $100 per night for one adult from March 20 to March 23, 2024. Sort the results by price, identify if any of top three results offer breakfast.", + "instantiation_dict": {}, + "intent": "Search for a budget hotel in Rome under $100 per night for one adult from March 20 to March 23, 2024. Sort the results by price, identify if any of top three results offer breakfast.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "ROMA GONDOLA SRLS, US$81, no breakfast", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Cambridge Dictionary--2", + "require_login": false, + "storage_state": null, + "start_url": "https://dictionary.cambridge.org/", + "geolocation": null, + "intent_template": "Look up the pronunciation, definition, and example sentence for the word \"ubiquitous\" in UK and US English.", + "instantiation_dict": {}, + "intent": "Look up the pronunciation, definition, and example sentence for the word \"ubiquitous\" in UK and US English.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "UK: /ju\u02d0\u02c8b\u026ak.w\u026a.t\u0259s/, US: /ju\u02d0\u02c8b\u026ak.w\u0259.t\u032c\u0259s/; seeming to be everywhere; Leather is very much in fashion this season, as is the ubiquitous denim.", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Cambridge Dictionary--12", + "require_login": false, + "storage_state": null, + "start_url": "https://dictionary.cambridge.org/", + "geolocation": null, + "intent_template": "Find the pronunciation, definition, and a sample sentence for the word \"resilience\" in the Cambridge Dictionary.", + "instantiation_dict": {}, + "intent": "Find the pronunciation, definition, and a sample sentence for the word \"resilience\" in the Cambridge Dictionary.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "UK: /r\u026a\u02c8z\u026al.j\u0259ns/, US: /r\u026a\u02c8z\u026al.j\u0259ns/; the ability to be happy, successful, etc. again after something difficult or bad has happened; Trauma researchers emphasize the resilience of the human psyche.", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Cambridge Dictionary--22", + "require_login": false, + "storage_state": null, + "start_url": "https://dictionary.cambridge.org/", + "geolocation": null, + "intent_template": "Use the Cambridge Dictionary to find the definition, UK pronunciation, and an example sentence for the word \"quintessential.\"", + "instantiation_dict": {}, + "intent": "Use the Cambridge Dictionary to find the definition, UK pronunciation, and an example sentence for the word \"quintessential.\"", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "UK: /\u02cckw\u026an.t\u026a\u02c8sen.\u0283\u0259l/, US:/\u02cckw\u026an.t\u026a\u02c8sen.\u0283\u0259l/; Def: being the most typical example or most important part of something; Sheep's milk cheese is the quintessential Corsican cheese.", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Cambridge Dictionary--32", + "require_login": false, + "storage_state": null, + "start_url": "https://dictionary.cambridge.org/", + "geolocation": null, + "intent_template": "Search for the differences between \"fewer\" and \"less\" in grammar section, and provide examples illustrating their correct usage from the Cambridge Dictionary.", + "instantiation_dict": {}, + "intent": "Search for the differences between \"fewer\" and \"less\" in grammar section, and provide examples illustrating their correct usage from the Cambridge Dictionary.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Article: 'Less or fewer?'; I do less work at weekends than I used to; Better cycle routes would mean fewer cars and fewer accidents.", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Cambridge Dictionary--42", + "require_login": false, + "storage_state": null, + "start_url": "https://dictionary.cambridge.org/", + "geolocation": null, + "intent_template": "Convert the Cambridge Dictionary homepage from English (UK) to Deutsch.", + "instantiation_dict": {}, + "intent": "Convert the Cambridge Dictionary homepage from English (UK) to Deutsch.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Action: Click English (UK), change language to: Deutsch", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Coursera--9", + "require_login": false, + "storage_state": null, + "start_url": "https://www.coursera.org/", + "geolocation": null, + "intent_template": "Identify a Coursera course on artificial intelligence ethics that has a duration of less than 20 hours to complete and has been rated 4+ stars by participants.", + "instantiation_dict": {}, + "intent": "Identify a Coursera course on artificial intelligence ethics that has a duration of less than 20 hours to complete and has been rated 4+ stars by participants.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Artificial Intelligence: Ethics & Societal Challenges", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Coursera--19", + "require_login": false, + "storage_state": null, + "start_url": "https://www.coursera.org/", + "geolocation": null, + "intent_template": "Identify a course on Coursera that provides an introduction to Psychology, list the instructor's name, the institution offering it, and how many hours it will approximately take to complete.", + "instantiation_dict": {}, + "intent": "Identify a course on Coursera that provides an introduction to Psychology, list the instructor's name, the institution offering it, and how many hours it will approximately take to complete.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Instructor: Paul Bloom; Yale University; 14 hours", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Coursera--29", + "require_login": false, + "storage_state": null, + "start_url": "https://www.coursera.org/", + "geolocation": null, + "intent_template": "Browse the Coursera website and find the price required for one year of Coursera Plus. How much is the discount? Then list 3 companies that work with Coursera.", + "instantiation_dict": {}, + "intent": "Browse the Coursera website and find the price required for one year of Coursera Plus. How much is the discount? Then list 3 companies that work with Coursera.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "$399/year, discount: 59 / month * 12 - 399 = 309; Google, IBM, and Imperial College London ...", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Coursera--39", + "require_login": false, + "storage_state": null, + "start_url": "https://www.coursera.org/", + "geolocation": null, + "intent_template": "Find the Space Safety course offered by TUM on Coursera. How many videos are there in module 2? What is the name of each video?", + "instantiation_dict": {}, + "intent": "Find the Space Safety course offered by TUM on Coursera. How many videos are there in module 2? What is the name of each video?", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "6 videos; Introduction; Space Debris; Mitigation; Measurements; Protection; Atmospheric Re-entry", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "ESPN--7", + "require_login": false, + "storage_state": null, + "start_url": "https://www.espn.com/", + "geolocation": null, + "intent_template": "Retrieve the final score and a brief summary of the latest NBA game played by the Los Angeles Lakers as reported on ESPN.", + "instantiation_dict": {}, + "intent": "Retrieve the final score and a brief summary of the latest NBA game played by the Los Angeles Lakers as reported on ESPN.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "<score> (latest, Los Angeles Lakers vs xxx); <summary>", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "ESPN--17", + "require_login": false, + "storage_state": null, + "start_url": "https://www.espn.com/", + "geolocation": null, + "intent_template": "Check out the NBA Basketball Power Index 2023-24 to see which teams are in first place and which are in last place.", + "instantiation_dict": {}, + "intent": "Check out the NBA Basketball Power Index 2023-24 to see which teams are in first place and which are in last place.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Boston Celtics; San Antonio Spurs", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "ESPN--27", + "require_login": false, + "storage_state": null, + "start_url": "https://www.espn.com/", + "geolocation": null, + "intent_template": "Search on ESPN for how many teams have 'Golden' in their name and how many of them are in the NHL.", + "instantiation_dict": {}, + "intent": "Search on ESPN for how many teams have 'Golden' in their name and how many of them are in the NHL.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "30 teams in search results, 1 team Vegas Golden Knights (NHL)", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "ESPN--37", + "require_login": false, + "storage_state": null, + "start_url": "https://www.espn.com/", + "geolocation": null, + "intent_template": "Check out LeBron James' Stats to see how many games he has played in his career so far.", + "instantiation_dict": {}, + "intent": "Check out LeBron James' Stats to see how many games he has played in his career so far.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "1471", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "GitHub--3", + "require_login": false, + "storage_state": null, + "start_url": "https://github.com/", + "geolocation": null, + "intent_template": "Find out how much more package storage the Enterprise version has over Team in GitHub Pricing.", + "instantiation_dict": {}, + "intent": "Find out how much more package storage the Enterprise version has over Team in GitHub Pricing.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "48GB", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "GitHub--13", + "require_login": false, + "storage_state": null, + "start_url": "https://github.com/", + "geolocation": null, + "intent_template": "Identify the latest top-trending open-source project in the category of 'Machine Learning' on GitHub, and check the number of stars it has received.", + "instantiation_dict": {}, + "intent": "Identify the latest top-trending open-source project in the category of 'Machine Learning' on GitHub, and check the number of stars it has received.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "microsoft/ML-For-Beginners", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "GitHub--23", + "require_login": false, + "storage_state": null, + "start_url": "https://github.com/", + "geolocation": null, + "intent_template": "Find the wiki page of ohmyzsh on GitHub and tell me how to change the theme of zsh to agnoster.", + "instantiation_dict": {}, + "intent": "Find the wiki page of ohmyzsh on GitHub and tell me how to change the theme of zsh to agnoster.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "edit the .zshrc file and set the ZSH_THEME variable to \"agnoster\"", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "GitHub--33", + "require_login": false, + "storage_state": null, + "start_url": "https://github.com/", + "geolocation": null, + "intent_template": "Find Customer Stories on the GitHub page and list the 2 stories that appear on the web page.", + "instantiation_dict": {}, + "intent": "Find Customer Stories on the GitHub page and list the 2 stories that appear on the web page.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Philips builds and deploys digital health technology faster with innersource on GitHub. Shopify keeps pushing eCommerce forward with help from GitHub tools.", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Flights--2", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/travel/flights/", + "geolocation": null, + "intent_template": "Find the lowest fare from all eligible one-way flights for 1 adult from JFK to Heathrow on Jan. 22.", + "instantiation_dict": {}, + "intent": "Find the lowest fare from all eligible one-way flights for 1 adult from JFK to Heathrow on Jan. 22.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Tap Air Portugal 10:00\u202fPM \u2013 5:30\u202fPM(+1), $355 (real-time)", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Flights--12", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/travel/flights/", + "geolocation": null, + "intent_template": "Find the best-priced round-trip flight from New York to London leaving on December 25, 2023, and returning on January 5, 2024, with one stop or fewer.", + "instantiation_dict": {}, + "intent": "Find the best-priced round-trip flight from New York to London leaving on December 25, 2023, and returning on January 5, 2024, with one stop or fewer.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Norse Atlantic UK, 6:10\u202fPM \u2013 6:00\u202fAM(+1), $757, Nonstop (real-time)", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Flights--22", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/travel/flights/", + "geolocation": null, + "intent_template": "Find a round-trip flight from Rio de Janeiro to Los Angeles, leaving on March 15, 2024, and returning on March 22, 2024, and select the option with the least carbon dioxide emissions.", + "instantiation_dict": {}, + "intent": "Find a round-trip flight from Rio de Janeiro to Los Angeles, leaving on March 15, 2024, and returning on March 22, 2024, and select the option with the least carbon dioxide emissions.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Gol, Aeromexico, 7:00\u202fAM \u2013 10:22\u202fPM, 746 kg CO2", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Flights--32", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/travel/flights/", + "geolocation": null, + "intent_template": "Search for round-trip flights from Stockholm to Toronto, departing on March 3, 2024, and returning on March 10, 2024, and sort the results to find the shortest total travel time.", + "instantiation_dict": {}, + "intent": "Search for round-trip flights from Stockholm to Toronto, departing on March 3, 2024, and returning on March 10, 2024, and sort the results to find the shortest total travel time.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Icelandair, 12:50\u202fPM \u2013 6:15\u202fPM, 11 hr 25 min", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Map--0", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/maps/", + "geolocation": null, + "intent_template": "Find 5 beauty salons with ratings greater than 4.8 in Seattle, WA.", + "instantiation_dict": {}, + "intent": "Find 5 beauty salons with ratings greater than 4.8 in Seattle, WA.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Beehive Salon, Intermezzo Salon & Spa, Cindy's Beauty Salon, The Red Chair Salon, Ella and Oz Salon", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Map--10", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/maps/", + "geolocation": null, + "intent_template": "Search for a park in the state of California called Castle Mountains National Monument and find out it's Basic Information.", + "instantiation_dict": {}, + "intent": "Search for a park in the state of California called Castle Mountains National Monument and find out it's Basic Information.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "located in Barstow, CA 92311; open 24 hours; phone number is (760) 252-6100", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Map--20", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/maps/", + "geolocation": null, + "intent_template": "Find Tesla Destination Charger closest to the National Air and Space Museum.", + "instantiation_dict": {}, + "intent": "Find Tesla Destination Charger closest to the National Air and Space Museum.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Tesla Destination Charger, 1330 Maryland Ave SW, Washington, DC 20024", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Map--30", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/maps/", + "geolocation": null, + "intent_template": "Locate a parking lot near the Brooklyn Bridge that open 24 hours. Review the user comments about it.", + "instantiation_dict": {}, + "intent": "Locate a parking lot near the Brooklyn Bridge that open 24 hours. Review the user comments about it.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "2-68 Division St Garage, <reviews>", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Map--40", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/maps/", + "geolocation": null, + "intent_template": "Find a restaurant in Boston that eats Boston lobster and asks for a rating of 4.6 or higher, and check out what a one-star review says.", + "instantiation_dict": {}, + "intent": "Find a restaurant in Boston that eats Boston lobster and asks for a rating of 4.6 or higher, and check out what a one-star review says.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Boston Sail Loft, 4.6; one star review: Not sure about the rest of the seafood here since I left immediately after trying their AWFUL Chowder. I won't call it clam chowder since I didn't see a single piece of clam. This stuff was more like if you heated up half & Half then sprinkle dill and salt in it. It's too bad the tourist think this is how it's supposed to taste.", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Search--9", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/", + "geolocation": null, + "intent_template": "Show the rating of Prometheus movie on IMDb and Rotten Tomatoes.", + "instantiation_dict": {}, + "intent": "Show the rating of Prometheus movie on IMDb and Rotten Tomatoes.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "IMDb 7.0/10, Rotten Tomatoes 73%", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Search--19", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/", + "geolocation": null, + "intent_template": "What are the first 7 bits of the SHA of the Bert's latest commit on GitHub, and what exactly was changed in that commit.", + "instantiation_dict": {}, + "intent": "What are the first 7 bits of the SHA of the Bert's latest commit on GitHub, and what exactly was changed in that commit.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "eedf571, Smaller BERT Models", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Search--29", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/", + "geolocation": null, + "intent_template": "Find out the current world record for the men's 100m sprint.", + "instantiation_dict": {}, + "intent": "Find out the current world record for the men's 100m sprint.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "9.58s held by Usain Bolt of Jamaica", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Search--39", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/", + "geolocation": null, + "intent_template": "Identify the top-10 trending travel destination for 2024 through a blog, how many of them are in Asian.", + "instantiation_dict": {}, + "intent": "Identify the top-10 trending travel destination for 2024 through a blog, how many of them are in Asian.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Tokyo, Japan; Seoul, South Korea; Halong Bay, Vietnam; Palawan Island, Philippines; Sapa, Vietnam; Bogota, Colombia; Pattaya, Thailand; Alajuela, Costa Rica; Phnom Penh, Cambodia; Kuala Lumpur, Malaysia. Asian: Tokyo, Japan; Seoul, South Korea; Halong Bay, Vietnam; Palawan Island, Philippines; Sapa, Vietnam; Kuala Lumpur, Malaysia; Phnom Penh, Cambodia", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Huggingface--6", + "require_login": false, + "storage_state": null, + "start_url": "https://huggingface.co/", + "geolocation": null, + "intent_template": "Find the model sentence-transformers/all-MiniLM-L6-v2 and use the Inference API on the webpage to get the similarity of the following two sentences: 'Tomorrow is Sunday', 'Eat a burger on Sunday'.", + "instantiation_dict": {}, + "intent": "Find the model sentence-transformers/all-MiniLM-L6-v2 and use the Inference API on the webpage to get the similarity of the following two sentences: 'Tomorrow is Sunday', 'Eat a burger on Sunday'.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "0.550", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Huggingface--16", + "require_login": false, + "storage_state": null, + "start_url": "https://huggingface.co/", + "geolocation": null, + "intent_template": "Find information on the latest (as of today's date) pre-trained language model on Huggingface suitable for text classification and briefly describe its intended use case and architecture.", + "instantiation_dict": {}, + "intent": "Find information on the latest (as of today's date) pre-trained language model on Huggingface suitable for text classification and briefly describe its intended use case and architecture.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "<model> (today, text classification)", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Huggingface--26", + "require_login": false, + "storage_state": null, + "start_url": "https://huggingface.co/", + "geolocation": null, + "intent_template": "Identify a model on Hugging Face designed for generating travel chats. Obtain information about the model, including its name, size and training framwork.", + "instantiation_dict": {}, + "intent": "Identify a model on Hugging Face designed for generating travel chats. Obtain information about the model, including its name, size and training framwork.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "PhilipTheGreat/DiabloGPT-small-Traveller, GPT2LMHeadModel, 510 MB", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Huggingface--36", + "require_login": false, + "storage_state": null, + "start_url": "https://huggingface.co/", + "geolocation": null, + "intent_template": "Summarize all the payment plans and their advantages in huggingface pricing.", + "instantiation_dict": {}, + "intent": "Summarize all the payment plans and their advantages in huggingface pricing.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "summary of https://huggingface.co/pricing", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Wolfram Alpha--3", + "require_login": false, + "storage_state": null, + "start_url": "https://www.wolframalpha.com/", + "geolocation": null, + "intent_template": "Let g(x) be the integral of x^2 cos(2x). Write the expression of g(x) with solution.", + "instantiation_dict": {}, + "intent": "Let g(x) be the integral of x^2 cos(2x). Write the expression of g(x) with solution.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "1/4 (2 x cos(2 x) + (-1 + 2 x^2) sin(2 x)) + Constant", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Wolfram Alpha--13", + "require_login": false, + "storage_state": null, + "start_url": "https://www.wolframalpha.com/", + "geolocation": null, + "intent_template": "What is 10,000 US dollars in 1980 and in 1970 Worth today?", + "instantiation_dict": {}, + "intent": "What is 10,000 US dollars in 1980 and in 1970 Worth today?", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "approximately: 36430; 77325", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Wolfram Alpha--23", + "require_login": false, + "storage_state": null, + "start_url": "https://www.wolframalpha.com/", + "geolocation": null, + "intent_template": "Calculate the population growth rate of Canada from 2020 to 2023 using Wolfram Alpha.", + "instantiation_dict": {}, + "intent": "Calculate the population growth rate of Canada from 2020 to 2023 using Wolfram Alpha.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "mean population growth rate of Canada from 2020 to 2023 is 0.9998% per year", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Wolfram Alpha--33", + "require_login": false, + "storage_state": null, + "start_url": "https://www.wolframalpha.com/", + "geolocation": null, + "intent_template": "Identify the electrical energy output of a hydroelectric power plant named Itaipu Dam in 2023 using Wolfram Alpha.", + "instantiation_dict": {}, + "intent": "Identify the electrical energy output of a hydroelectric power plant named Itaipu Dam in 2023 using Wolfram Alpha.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "89.5 TWh (terawatt hours)", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Wolfram Alpha--43", + "require_login": false, + "storage_state": null, + "start_url": "https://www.wolframalpha.com/", + "geolocation": null, + "intent_template": "A polyominoes of order 6 means you have 6 identical squares to combine different shapes (2-sided). How many combinations are there? Looking at all the shapes in the result, how many of them have only 2 rows in total?", + "instantiation_dict": {}, + "intent": "A polyominoes of order 6 means you have 6 identical squares to combine different shapes (2-sided). How many combinations are there? Looking at all the shapes in the result, how many of them have only 2 rows in total?", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "35; 12", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + } +] \ No newline at end of file diff --git a/test/tasks/webvoyager_test.json b/test/tasks/webvoyager_test.json index d730714..cede577 100644 --- a/test/tasks/webvoyager_test.json +++ b/test/tasks/webvoyager_test.json @@ -6810,9 +6810,9 @@ "storage_state": null, "start_url": "https://www.booking.com/", "geolocation": null, - "intent_template": "Look up Vienna hotel options with availability for a 4-night stay from September 28 to October 4, 2024, with amenities that include a Parking, breakfast included, and a rating of 8+ on Booking.com.", + "intent_template": "Look up Vienna hotel options with availability for a 4-night stay from September 28 to October 2, 2024, with amenities that include a Parking, breakfast included, and a rating of 8+ on Booking.com.", "instantiation_dict": {}, - "intent": "Look up Vienna hotel options with availability for a 4-night stay from September 28 to October 4, 2024, with amenities that include a Parking, breakfast included, and a rating of 8+ on Booking.com.", + "intent": "Look up Vienna hotel options with availability for a 4-night stay from September 28 to October 2, 2024, with amenities that include a Parking, breakfast included, and a rating of 8+ on Booking.com.", "require_reset": false, "eval": { "eval_types": [ diff --git a/test/tests_processor.py b/test/tests_processor.py index 222f7d4..64b13b9 100644 --- a/test/tests_processor.py +++ b/test/tests_processor.py @@ -14,6 +14,7 @@ from ae.core.autogen_wrapper import AutogenWrapper from ae.core.playwright_manager import PlaywrightManager from ae.utils.logger import logger +from ae.utils.response_parser import parse_response from autogen.agentchat.chat import ChatResult # type: ignore from playwright.async_api import Page from tabulate import tabulate @@ -97,11 +98,19 @@ def save_individual_test_result(test_result: dict[str, str | int | float | None] def extract_last_response(messages: list[dict[str, Any]]) -> str: """Extract the last response message from chat history.""" - # Iterate over the messages in reverse order - for message in reversed(messages): - if '##TERMINATE##' in message.get('content', ''): - return message['content'].replace("##TERMINATE##", "").strip() - return "" + try: + # Iterate over the messages in reverse order + for message in reversed(messages): + if message and 'content' in message: + content=message.get('content', "") + content_json = parse_response(content) + final_answer = content_json.get('final_response', None) + if final_answer: + return final_answer + return "" + except: + logger.error("Error extracting last response from chat history.") + return "" def print_progress_bar(current: int, total: int, bar_length: int = 50) -> None: @@ -321,9 +330,7 @@ async def run_tests(ag: AutogenWrapper, browser_manager: PlaywrightManager, min_ browser_manager = browserManager.PlaywrightManager(headless=False) await browser_manager.async_initialize() - context = await browser_manager.get_browser_context() - page = await context.new_page() # type: ignore - + page=await browser_manager.get_current_page() test_results = [] max_task_index = len(test_configurations) if not max_task_index else max_task_index total_tests = max_task_index

diff --git a/.gitignore b/.gitignore index b22b1de..cd7c4ff 100644 --- a/.gitignore +++ b/.gitignore @@ -155,4 +155,7 @@ cython_debug/ ae/log_files/* ae/temp/* test/logs/* -test/results/* \ No newline at end of file +test/results/* +Pipfile.lock +requirements.txt +Pipfile diff --git a/README.md b/README.md index d509373..95cdde0 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ This provides a natural language way to interacting with a web browser: - Manage and automate tasks on project management platforms (like JIRA) by filtering issues, easing the workflow for users. - Provide personal shopping assistance, suggesting products based on the user's needs, such as storage options for game cards. -While Agent-E is growing, it is already equipped to handle a versatile range of tasks, but the best task is the one that you come up with. So, take it for a spin and tell us what you were able to do with it. For more information see our [blog article](https://blog.emergence.ai/2024/03/28/distilling-the-web-agent.html). +While Agent-E is growing, it is already equipped to handle a versatile range of tasks, but the best task is the one that you come up with. So, take it for a spin and tell us what you were able to do with it. For more information see our [blog article](https://www.emergence.ai/blog/distilling-the-web-for-multi-agent-automation). ## Quick Start @@ -156,6 +156,39 @@ html_theme = 'sphinx_rtd_theme' 7. Build the documentation, from `docs` directory, run: `sphinx-build -b html . _build` +## Open-source models + +Using open-source models is possible through LiteLLM with Ollama. Ollama allows users to run language models locally on their machines, and LiteLLM translates OpenAI-format inputs to local models' endpoints. To use open-source models as Agent-E backbone, follow the steps below: + +1. Install LiteLLM + ```bash + pip install 'litellm[proxy]' + ``` +2. Install Ollama + * For Mac and Windows, download [Ollama](https://ollama.com/download). + * For Linux: + ```bash + curl -fsSL https://ollama.com/install.sh | sh + ``` +3. Pull Ollama models + Before you can use a model, you need to download it from the library. The list of available models is [here](https://ollama.com/library). Here, we use Mistral v0.3: + ```bash + ollama pull mistral:v0.3 + ``` +4. Run LiteLLM + To run the downloaded model with LiteLLM as a proxy, run: + ```bash + litellm --model ollama_chat/mistral:v0.3 + ``` +5. Configure model in Autogen + Configure the `.env` file as follows. Note that the model name and API keys are not needed since the local model is already running. + ```bash + AUTOGEN_MODEL_NAME=NotRequired + AUTOGEN_MODEL_API_KEY=NotRequired + AUTOGEN_MODEL_BASE_URL=http://0.0.0.0:400 + ``` + + ## TODO - Action verification - Responding from every skill with changes that took place in the DOM (Mutation Observers) so that the LLM can judge whether the skill did execute properly or not diff --git a/ae/__init__.py b/ae/__init__.py index cf2b767..acf5751 100644 --- a/ae/__init__.py +++ b/ae/__init__.py @@ -1 +1 @@ -from ae import core \ No newline at end of file +from ae import core # type: ignore # noqa: F401 diff --git a/ae/config.py b/ae/config.py index e5bc25a..73beffb 100644 --- a/ae/config.py +++ b/ae/config.py @@ -23,4 +23,4 @@ if not os.path.exists(PROJECT_TEMP_PATH): os.makedirs(PROJECT_TEMP_PATH) - print(f"Created temp folder at: {PROJECT_TEMP_PATH}") \ No newline at end of file + print(f"Created temp folder at: {PROJECT_TEMP_PATH}") diff --git a/ae/core/__init__.py b/ae/core/__init__.py index 351a58b..5a412db 100644 --- a/ae/core/__init__.py +++ b/ae/core/__init__.py @@ -1,10 +1,8 @@ from ae.core import agents from ae.core import memory from ae.core import skills - from ae.core.autogen_wrapper import AutogenWrapper from ae.core.playwright_manager import PlaywrightManager -from ae.core.post_process_responses import final_reply_callback_browser_agent from ae.core.post_process_responses import final_reply_callback_user_proxy from ae.core.prompts import LLM_PROMPTS from ae.core.system_orchestrator import SystemOrchestrator diff --git a/ae/core/agents/__init__.py b/ae/core/agents/__init__.py index 23a6035..0639e91 100644 --- a/ae/core/agents/__init__.py +++ b/ae/core/agents/__init__.py @@ -1,2 +1 @@ -from ae.core.agents.browser_nav_agent import BrowserNavAgent -from ae.core.agents.browser_nav_agent_no_skills import BrowserNavAgentNoSkills \ No newline at end of file +from ae.core.agents.browser_nav_agent import BrowserNavAgent \ No newline at end of file diff --git a/ae/core/agents/browser_nav_agent.py b/ae/core/agents/browser_nav_agent.py index de20a5f..58dbb76 100644 --- a/ae/core/agents/browser_nav_agent.py +++ b/ae/core/agents/browser_nav_agent.py @@ -1,24 +1,26 @@ +from datetime import datetime from string import Template import autogen # type: ignore from ae.core.memory.static_ltm import get_user_ltm -from ae.core.post_process_responses import final_reply_callback_browser_agent as print_message_from_user_proxy # type: ignore -from ae.core.post_process_responses import final_reply_callback_user_proxy as print_message_from_browser_agent # type: ignore from ae.core.prompts import LLM_PROMPTS from ae.core.skills.click_using_selector import click as click_element -from ae.core.skills.enter_text_and_click import enter_text_and_click + +# from ae.core.skills.enter_text_and_click import enter_text_and_click from ae.core.skills.enter_text_using_selector import bulk_enter_text from ae.core.skills.enter_text_using_selector import entertext from ae.core.skills.get_dom_with_content_type import get_dom_with_content_type from ae.core.skills.get_url import geturl -from ae.core.skills.get_user_input import get_user_input from ae.core.skills.open_url import openurl from ae.core.skills.pdf_text_extractor import extract_text_from_pdf +#from ae.core.skills.pdf_text_extractor import extract_text_from_pdf +from ae.core.skills.press_key_combination import press_key_combination + class BrowserNavAgent: - def __init__(self, config_list, user_proxy_agent: autogen.UserProxyAgent): # type: ignore + def __init__(self, config_list, browser_nav_executor: autogen.UserProxyAgent): # type: ignore """ Initialize the BrowserNavAgent and store the AssistantAgent instance as an instance attribute for external access. @@ -27,21 +29,23 @@ def __init__(self, config_list, user_proxy_agent: autogen.UserProxyAgent): # typ - config_list: A list of configuration parameters required for AssistantAgent. - user_proxy_agent: An instance of the UserProxyAgent class. """ - self.user_proxy_agent = user_proxy_agent + self.browser_nav_executor = browser_nav_executor user_ltm = self.__get_ltm() system_message = LLM_PROMPTS["BROWSER_AGENT_PROMPT"] - + system_message = system_message + "\n" + f"Today's date is {datetime.now().strftime('%d %B %Y')}" if user_ltm: #add the user LTM to the system prompt if it exists user_ltm = "\n" + user_ltm system_message = Template(system_message).substitute(basic_user_information=user_ltm) - self.agent = autogen.AssistantAgent( + self.agent = autogen.ConversableAgent( name="browser_navigation_agent", system_message=system_message, llm_config={ "config_list": config_list, - "cache_seed": 2, - "temperature": 0.0 + "cache_seed": None, + "temperature": 0.0, + "top_p": 0.001, + "seed":12345 }, ) self.__register_skills() @@ -59,54 +63,53 @@ def __register_skills(self): """ Register all the skills that the agent can perform. """ - # Register get_user_input skill for execution by user_proxy_agent - self.user_proxy_agent.register_for_execution()(get_user_input) # type: ignore - # Register get_user_input skill for LLM by assistant agent - self.agent.register_for_llm(description=LLM_PROMPTS["GET_USER_INPUT_PROMPT"])(get_user_input) # type: ignore - # Register openurl skill for execution by user_proxy_agent - self.user_proxy_agent.register_for_execution()(openurl) # type: ignore # Register openurl skill for LLM by assistant agent - self.agent.register_for_llm(description=LLM_PROMPTS["OPEN_URL_PROMPT"])(openurl) # type: ignore + self.agent.register_for_llm(description=LLM_PROMPTS["OPEN_URL_PROMPT"])(openurl) + # Register openurl skill for execution by user_proxy_agent + self.browser_nav_executor.register_for_execution()(openurl) - # Register enter_text_and_click skill for execution by user_proxy_agent - self.user_proxy_agent.register_for_execution()(enter_text_and_click) # Register enter_text_and_click skill for LLM by assistant agent - self.agent.register_for_llm(description=LLM_PROMPTS["ENTER_TEXT_AND_CLICK_PROMPT"])(enter_text_and_click) + # self.agent.register_for_llm(description=LLM_PROMPTS["ENTER_TEXT_AND_CLICK_PROMPT"])(enter_text_and_click) + # Register enter_text_and_click skill for execution by user_proxy_agent + # self.browser_nav_executor.register_for_execution()(enter_text_and_click) - # Register get_dom_with_content_type skill for execution by user_proxy_agent - self.user_proxy_agent.register_for_execution()(get_dom_with_content_type) # Register get_dom_with_content_type skill for LLM by assistant agent self.agent.register_for_llm(description=LLM_PROMPTS["GET_DOM_WITH_CONTENT_TYPE_PROMPT"])(get_dom_with_content_type) + # Register get_dom_with_content_type skill for execution by user_proxy_agent + self.browser_nav_executor.register_for_execution()(get_dom_with_content_type) - # Register click_element skill for execution by user_proxy_agent - self.user_proxy_agent.register_for_execution()(click_element) # Register click_element skill for LLM by assistant agent - #self.agent.register_for_llm(description=LLM_PROMPTS["CLICK_PROMPT_ACCESSIBILITY"])(click_element) self.agent.register_for_llm(description=LLM_PROMPTS["CLICK_PROMPT"])(click_element) + # Register click_element skill for execution by user_proxy_agent + self.browser_nav_executor.register_for_execution()(click_element) - # Register geturl skill for execution by user_proxy_agent - self.user_proxy_agent.register_for_execution()(geturl) # Register geturl skill for LLM by assistant agent self.agent.register_for_llm(description=LLM_PROMPTS["GET_URL_PROMPT"])(geturl) + # Register geturl skill for execution by user_proxy_agent + self.browser_nav_executor.register_for_execution()(geturl) - # Register bulk_enter_text skill for execution by user_proxy_agent - self.user_proxy_agent.register_for_execution()(bulk_enter_text) # Register bulk_enter_text skill for LLM by assistant agent self.agent.register_for_llm(description=LLM_PROMPTS["BULK_ENTER_TEXT_PROMPT"])(bulk_enter_text) + # Register bulk_enter_text skill for execution by user_proxy_agent + self.browser_nav_executor.register_for_execution()(bulk_enter_text) - # Register entertext skill for execution by user_proxy_agent - self.user_proxy_agent.register_for_execution()(entertext) # Register entertext skill for LLM by assistant agent self.agent.register_for_llm(description=LLM_PROMPTS["ENTER_TEXT_PROMPT"])(entertext) - # Register entertext skill for execution by user_proxy_agent - self.user_proxy_agent.register_for_execution()(extract_text_from_pdf) + self.browser_nav_executor.register_for_execution()(entertext) + # Register entertext skill for LLM by assistant agent + self.agent.register_for_llm(description=LLM_PROMPTS["PRESS_KEY_COMBINATION_PROMPT"])(press_key_combination) + # Register entertext skill for execution by user_proxy_agent + self.browser_nav_executor.register_for_execution()(press_key_combination) + self.agent.register_for_llm(description=LLM_PROMPTS["EXTRACT_TEXT_FROM_PDF_PROMPT"])(extract_text_from_pdf) + self.browser_nav_executor.register_for_execution()(extract_text_from_pdf) + ''' # Register reply function for printing messages - self.user_proxy_agent.register_reply( # type: ignore + self.browser_nav_executor.register_reply( # type: ignore [autogen.Agent, None], reply_func=print_message_from_user_proxy, config={"callback": None}, @@ -116,3 +119,6 @@ def __register_skills(self): reply_func=print_message_from_browser_agent, config={"callback": None}, ) + ''' + # print(f">>> Function map: {self.browser_nav_executor.function_map}") # type: ignore + # print(">>> Registered skills for BrowserNavAgent and BrowserNavExecutorAgent") diff --git a/ae/core/agents/browser_nav_agent_no_skills.py b/ae/core/agents/browser_nav_agent_no_skills.py deleted file mode 100644 index 44042ec..0000000 --- a/ae/core/agents/browser_nav_agent_no_skills.py +++ /dev/null @@ -1,41 +0,0 @@ -import autogen # type: ignore - -from ae.core.prompts import LLM_PROMPTS - - -class BrowserNavAgentNoSkills: - def __init__(self, config_list, user_proxy_agent: autogen.UserProxyAgent): # type: ignore - """ - Initialize the BrowserNavAgentNoSkills class and registers any necessary skills. - - Parameters: - - config_list (list): Configuration parameters required for AssistantAgent. - - user_proxy_agent (UserProxyAgent): An instance of the UserProxyAgent class. - - Returns: - None - """ - self.user_proxy_agent = user_proxy_agent - self.agent = autogen.AssistantAgent( - name="browser_navigation_agent_no_skills", - system_message=LLM_PROMPTS["BROWSER_AGENT_NO_SKILLS_PROMPT"], - llm_config={ - "config_list": config_list, - "cache_seed": 41, - "temperature": 0.0 - }, - ) - self.__register_skills() - - - def __register_skills(self): - """ - Register all the skills that the agent can perform. - - Parameters: - None - - Returns: - None - """ - pass diff --git a/ae/core/agents/high_level_planner_agent.py b/ae/core/agents/high_level_planner_agent.py new file mode 100644 index 0000000..2d440b9 --- /dev/null +++ b/ae/core/agents/high_level_planner_agent.py @@ -0,0 +1,61 @@ +from datetime import datetime +from string import Template + +import autogen # type: ignore +from autogen import ConversableAgent # type: ignore + +from ae.core.memory.static_ltm import get_user_ltm +from ae.core.post_process_responses import final_reply_callback_planner_agent as print_message_as_planner # type: ignore +from ae.core.prompts import LLM_PROMPTS +from ae.core.skills.get_user_input import get_user_input + + +class PlannerAgent: + def __init__(self, config_list, user_proxy_agent:ConversableAgent): # type: ignore + """ + Initialize the PlannerAgent and store the AssistantAgent instance + as an instance attribute for external access. + + Parameters: + - config_list: A list of configuration parameters required for AssistantAgent. + - user_proxy_agent: An instance of the UserProxyAgent class. + """ + + user_ltm = self.__get_ltm() + system_message = LLM_PROMPTS["PLANNER_AGENT_PROMPT"] + + if user_ltm: #add the user LTM to the system prompt if it exists + user_ltm = "\n" + user_ltm + system_message = Template(system_message).substitute(basic_user_information=user_ltm) + system_message = system_message + "\n" + f"Today's date is {datetime.now().strftime('%d %B %Y')}" + self.agent = autogen.AssistantAgent( + name="planner_agent", + system_message=system_message, + llm_config={ + "config_list": config_list, + "cache_seed": None, + "temperature": 0.0, + "top_p": 0.001, + "seed":12345 + }, + ) + + # Register get_user_input skill for LLM by assistant agent + self.agent.register_for_llm(description=LLM_PROMPTS["GET_USER_INPUT_PROMPT"])(get_user_input) + # Register get_user_input skill for execution by user_proxy_agent + user_proxy_agent.register_for_execution()(get_user_input) + + self.agent.register_reply( # type: ignore + [autogen.AssistantAgent, None], + reply_func=print_message_as_planner, + config={"callback": None}, + ignore_async_in_sync_chat=True + ) + + def __get_ltm(self): + """ + Get the the long term memory of the user. + returns: str | None - The user LTM or None if not found. + """ + return get_user_ltm() + diff --git a/ae/core/autogen_wrapper.py b/ae/core/autogen_wrapper.py index 3759ad2..9b53864 100644 --- a/ae/core/autogen_wrapper.py +++ b/ae/core/autogen_wrapper.py @@ -1,3 +1,4 @@ +import asyncio import json import os import tempfile @@ -7,6 +8,7 @@ from typing import Any import autogen # type: ignore +import nest_asyncio # type: ignore import openai #from autogen import Cache @@ -14,10 +16,16 @@ from ae.config import SOURCE_LOG_FOLDER_PATH from ae.core.agents.browser_nav_agent import BrowserNavAgent -from ae.core.agents.browser_nav_agent_no_skills import BrowserNavAgentNoSkills +from ae.core.agents.high_level_planner_agent import PlannerAgent +from ae.core.post_process_responses import final_reply_callback_planner_agent as notify_planner_messages # type: ignore from ae.core.prompts import LLM_PROMPTS +from ae.core.skills.get_url import geturl +from ae.utils.autogen_sequential_function_call import UserProxyAgent_SequentialFunctionExecution from ae.utils.logger import logger +from ae.utils.response_parser import parse_response +from ae.utils.ui_messagetype import MessageType +nest_asyncio.apply() # type: ignore class AutogenWrapper: """ @@ -32,27 +40,30 @@ class AutogenWrapper: """ - def __init__(self, max_chat_round: int = 50): + def __init__(self, max_chat_round: int = 1000): self.number_of_rounds = max_chat_round - self.agents_map: dict[str, autogen.UserProxyAgent | autogen.AssistantAgent] | None = None + + self.agents_map: dict[str, UserProxyAgent_SequentialFunctionExecution | autogen.AssistantAgent | autogen.ConversableAgent ] | None = None + self.config_list: list[dict[str, str]] | None = None self.chat_logs_dir: str = SOURCE_LOG_FOLDER_PATH @classmethod - async def create(cls, agents_needed: list[str] | None = None, max_chat_round: int = 50): + async def create(cls, agents_needed: list[str] | None = None, max_chat_round: int = 1000): """ Create an instance of AutogenWrapper. Args: - agents_needed (list[str], optional): The list of agents needed. If None, then ["user_proxy", "browser_nav_agent"] will be used. + agents_needed (list[str], optional): The list of agents needed. If None, then ["user", "browser_nav_executor", "planner_agent", "browser_nav_agent"] will be used. max_chat_round (int, optional): The maximum number of chat rounds. Defaults to 50. Returns: AutogenWrapper: An instance of AutogenWrapper. """ + print(f">>> Creating AutogenWrapper with {agents_needed} and {max_chat_round} rounds.") if agents_needed is None: - agents_needed = ["user_proxy", "browser_nav_agent"] + agents_needed = ["user", "browser_nav_executor", "planner_agent", "browser_nav_agent"] # Create an instance of cls self = cls(max_chat_round) load_dotenv() @@ -60,7 +71,7 @@ async def create(cls, agents_needed: list[str] | None = None, max_chat_round: in autogen_model_name = os.getenv("AUTOGEN_MODEL_NAME") if not autogen_model_name: - autogen_model_name = "gpt-4-turbo-preview" + autogen_model_name = "gpt-4-turbo" logger.warning(f"Cannot find AUTOGEN_MODEL_NAME in the environment variables, setting it to default {autogen_model_name}.") autogen_model_api_key = os.getenv("AUTOGEN_MODEL_API_KEY") @@ -92,6 +103,63 @@ async def create(cls, agents_needed: list[str] | None = None, max_chat_round: in self.config_list = autogen.config_list_from_json(env_or_file=temp_file_path, filter_dict={"model": {autogen_model_name}}) # type: ignore self.agents_map = await self.__initialize_agents(agents_needed) + def trigger_nested_chat(manager: autogen.ConversableAgent): + content:str=manager.last_message()["content"] # type: ignore + content_json = parse_response(content) # type: ignore + next_step = content_json.get('next_step', None) + plan = content_json.get('plan', None) + if plan is not None: + notify_planner_messages(plan, message_type=MessageType.PLAN) + + if next_step is None: + notify_planner_messages("Received no response, terminating..", message_type=MessageType.INFO) # type: ignore + return False + else: + notify_planner_messages(next_step, message_type=MessageType.STEP) # type: ignore + return True + + def get_url() -> str: + return asyncio.run(geturl()) + + def my_custom_summary_method(sender: autogen.ConversableAgent,recipient: autogen.ConversableAgent, summary_args: dict ) : # type: ignore + messages_str_keys = {str(key): value for key, value in sender.chat_messages.items()} # type: ignore + self.__save_chat_log(list(messages_str_keys.values())[0]) # type: ignore + last_message=recipient.last_message(sender)["content"] # type: ignore + if not last_message or last_message.strip() == "": # type: ignore + return "I received an empty message. Try a different approach." + elif "##TERMINATE TASK##" in last_message: + last_message=last_message.replace("##TERMINATE TASK##", "") # type: ignore + last_message=last_message+" "+ get_url() # type: ignore + notify_planner_messages(last_message, message_type=MessageType.ACTION) # type: ignore + return last_message # type: ignore + return recipient.last_message(sender)["content"] # type: ignore + + def reflection_message(recipient, messages, sender, config): # type: ignore + last_message=messages[-1]["content"] # type: ignore + content_json = parse_response(last_message) # type: ignore + next_step = content_json.get('next_step', None) + + if next_step is None: + print ("Message to nested chat returned None") + return None + else: + next_step = next_step.strip() +" " + get_url() # type: ignore + return next_step # type: ignore + + # print(f">>> Registering nested chat. Available agents: {self.agents_map}") + self.agents_map["user"].register_nested_chats( # type: ignore + [ + { + "sender": self.agents_map["browser_nav_executor"], + "recipient": self.agents_map["browser_nav_agent"], + "message":reflection_message, + "max_turns": self.number_of_rounds, + "summary_method": my_custom_summary_method, + } + ], + trigger=trigger_nested_chat, # type: ignore + ) + return self @@ -134,55 +202,102 @@ async def __initialize_agents(self, agents_needed: list[str]): dict: A dictionary of agent instances. """ - if "user_proxy" not in agents_needed: - raise ValueError("user_proxy agent is required in the list of needed agents.") + agents_map: dict[str, UserProxyAgent_SequentialFunctionExecution | autogen.ConversableAgent]= {} - agents_map: dict[str, autogen.AssistantAgent | autogen.UserProxyAgent]= {} + user_delegate_agent = await self.__create_user_delegate_agent() + agents_map["user"] = user_delegate_agent + agents_needed.remove("user") - user_proxy_agent = await self.__create_user_proxy_agent() - user_proxy_agent.reset() - agents_map["user_proxy"] = user_proxy_agent - agents_needed.remove("user_proxy") + browser_nav_executor = self.__create_browser_nav_executor_agent() + agents_map["browser_nav_executor"] = browser_nav_executor + agents_needed.remove("browser_nav_executor") for agent_needed in agents_needed: if agent_needed == "browser_nav_agent": - browser_nav_agent: autogen.AssistantAgent = self.__create_browser_nav_agent(user_proxy_agent) - browser_nav_agent.reset() + browser_nav_agent: autogen.ConversableAgent = self.__create_browser_nav_agent(agents_map["browser_nav_executor"] ) agents_map["browser_nav_agent"] = browser_nav_agent - elif agent_needed == "browser_nav_agent_no_skills": - browser_nav_agent_no_skills = self.__create_browser_nav_agent_no_skills(user_proxy_agent) - browser_nav_agent_no_skills.reset() - agents_map["browser_nav_agent_no_skills"] = browser_nav_agent_no_skills + elif agent_needed == "planner_agent": + planner_agent = self.__create_planner_agent(user_delegate_agent) + agents_map["planner_agent"] = planner_agent else: raise ValueError(f"Unknown agent type: {agent_needed}") - return agents_map - async def __create_user_proxy_agent(self): + async def __create_user_delegate_agent(self) -> autogen.ConversableAgent: """ - Create a UserProxyAgent instance. + Create a ConversableAgent instance. Returns: - autogen.UserProxyAgent: An instance of UserProxyAgent. + autogen.ConversableAgent: An instance of ConversableAgent. """ - user_proxy_agent = autogen.UserProxyAgent( - name="user_proxy", + def is_planner_termination_message(x: dict[str, str])->bool: # type: ignore + should_terminate = False + function: Any = x.get("function", None) + if function is not None: + return False + + content:Any = x.get("content", "") + if content is None: + content = "" + should_terminate = True + else: + try: + content_json = parse_response(content) + _terminate = content_json.get('terminate', "no") + final_response = content_json.get('final_response', None) + if(_terminate == "yes"): + should_terminate = True + if final_response: + notify_planner_messages(final_response, message_type=MessageType.ANSWER) + except json.JSONDecodeError: + logger.error("Error decoding JSON response:\n{content}.\nTerminating..") + should_terminate = True + + return should_terminate # type: ignore + + task_delegate_agent = UserProxyAgent_SequentialFunctionExecution( + name="user", + llm_config=False, system_message=LLM_PROMPTS["USER_AGENT_PROMPT"], - is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().upper().endswith("##TERMINATE##"), # type: ignore + is_termination_msg=is_planner_termination_message, # type: ignore human_input_mode="NEVER", max_consecutive_auto_reply=self.number_of_rounds, - code_execution_config={ - "last_n_messages": 1, - "work_dir": "./", - "use_docker": False, - } ) + return task_delegate_agent - return user_proxy_agent + def __create_browser_nav_executor_agent(self): + """ + Create a UserProxyAgent instance for executing browser control. + + Returns: + autogen.UserProxyAgent: An instance of UserProxyAgent. - def __create_browser_nav_agent(self, user_proxy_agent: autogen.UserProxyAgent) -> autogen.AssistantAgent: + """ + def is_browser_executor_termination_message(x: dict[str, str])->bool: # type: ignore + tools_call:Any = x.get("tool_calls", "") + if tools_call : + return False + else: + return True + + browser_nav_executor_agent = UserProxyAgent_SequentialFunctionExecution( + name="browser_nav_executor", + is_termination_msg=is_browser_executor_termination_message, + human_input_mode="NEVER", + llm_config=None, + max_consecutive_auto_reply=self.number_of_rounds, + code_execution_config={ + "last_n_messages": 1, + "work_dir": "tasks", + "use_docker": False, + }, + ) + print(">>> Created browser_nav_executor_agent:", browser_nav_executor_agent) + return browser_nav_executor_agent + + def __create_browser_nav_agent(self, user_proxy_agent: UserProxyAgent_SequentialFunctionExecution) -> autogen.ConversableAgent: """ Create a BrowserNavAgent instance. @@ -197,21 +312,16 @@ def __create_browser_nav_agent(self, user_proxy_agent: autogen.UserProxyAgent) - #print(">>> browser agent tools:", json.dumps(browser_nav_agent.agent.llm_config.get("tools"), indent=2)) return browser_nav_agent.agent - - def __create_browser_nav_agent_no_skills(self, user_proxy_agent: autogen.UserProxyAgent): + def __create_planner_agent(self, assistant_agent: autogen.ConversableAgent): """ - Create a BrowserNavAgentNoSkills instance. This is mainly used for exploration at this point - - Args: - user_proxy_agent (autogen.UserProxyAgent): The instance of UserProxyAgent that was created. + Create a Planner Agent instance. This is mainly used for exploration at this point Returns: - autogen.AssistantAgent: An instance of BrowserNavAgentNoSkills. + autogen.AssistantAgent: An instance of PlannerAgent. """ - browser_nav_agent_no_skills = BrowserNavAgentNoSkills(self.config_list, user_proxy_agent) # type: ignore - return browser_nav_agent_no_skills.agent - + planner_agent = PlannerAgent(self.config_list, assistant_agent) # type: ignore + return planner_agent.agent async def process_command(self, command: str, current_url: str | None = None) -> autogen.ChatResult | None: """ @@ -227,7 +337,7 @@ async def process_command(self, command: str, current_url: str | None = None) -> """ current_url_prompt_segment = "" if current_url: - current_url_prompt_segment = f"Current URL: {current_url}" + current_url_prompt_segment = f"Current Page: {current_url}" prompt = Template(LLM_PROMPTS["COMMAND_EXECUTION_PROMPT"]).substitute(command=command, current_url_prompt_segment=current_url_prompt_segment) logger.info(f"Prompt for command: {prompt}") @@ -236,15 +346,9 @@ async def process_command(self, command: str, current_url: str | None = None) -> if self.agents_map is None: raise ValueError("Agents map is not initialized.") - if "browser_nav_no_skills" in self.agents_map: - browser_nav_agent = self.agents_map["browser_nav_agent_no_skills"] - elif "browser_nav_agent" in self.agents_map: - browser_nav_agent = self.agents_map["browser_nav_agent"] - else: - raise ValueError(f"No browser navigation agent found. in agents_map {self.agents_map}") - - result = await self.agents_map["user_proxy"].a_initiate_chat( # type: ignore - browser_nav_agent, # self.manager + result=await self.agents_map["user"].a_initiate_chat( # type: ignore + self.agents_map["planner_agent"], # self.manager # type: ignore + max_turns=self.number_of_rounds, #clear_history=True, message=prompt, silent=False, @@ -258,3 +362,4 @@ async def process_command(self, command: str, current_url: str | None = None) -> except openai.BadRequestError as bre: logger.error(f"Unable to process command: \"{command}\". {bre}") traceback.print_exc() + diff --git a/ae/core/playwright_manager.py b/ae/core/playwright_manager.py index 6fb8b06..858a2cd 100644 --- a/ae/core/playwright_manager.py +++ b/ae/core/playwright_manager.py @@ -11,8 +11,10 @@ from ae.core.ui_manager import UIManager from ae.utils.dom_mutation_observer import dom_mutation_change_detected from ae.utils.dom_mutation_observer import handle_navigation_for_mutation_observer +from ae.utils.js_helper import beautify_plan_message from ae.utils.js_helper import escape_js_message from ae.utils.logger import logger +from ae.utils.ui_messagetype import MessageType # Enusres that playwright does not wait for font loading when taking screenshots. Reference: https://github.com/microsoft/playwright/issues/28995 os.environ["PW_TEST_SCREENSHOT_NO_FONTS_READY"] = "1" @@ -247,12 +249,11 @@ async def set_navigation_handler(self): page.on("domcontentloaded", handle_navigation_for_mutation_observer) # type: ignore await page.expose_function("dom_mutation_change_detected", dom_mutation_change_detected) # type: ignore - async def set_overlay_state_handler(self): logger.debug("Setting overlay state handler") context = await self.get_browser_context() await context.expose_function('overlay_state_changed', self.overlay_state_handler) # type: ignore - + await context.expose_function('show_steps_state_changed',self.show_steps_state_handler) # type: ignore async def overlay_state_handler(self, is_collapsed: bool): page = await self.get_current_page() @@ -260,46 +261,78 @@ async def overlay_state_handler(self, is_collapsed: bool): if not is_collapsed: await self.ui_manager.update_overlay_chat_history(page) + async def show_steps_state_handler(self, show_details: bool): + page = await self.get_current_page() + await self.ui_manager.update_overlay_show_details(show_details, page) async def set_user_response_handler(self): context = await self.get_browser_context() await context.expose_function('user_response', self.receive_user_response) # type: ignore - async def notify_user(self, message: str): + async def notify_user(self, message: str, message_type: MessageType = MessageType.STEP): """ Notify the user with a message. Args: message (str): The message to notify the user with. + message_type (enum, optional): Values can be 'PLAN', 'QUESTION', 'ANSWER', 'INFO', 'STEP'. Defaults to 'STEP'. + To Do: Convert to Enum. """ - logger.debug(f"Notification: \"{message}\" being sent to the user.") + + if message.startswith(":"): + message = message[1:] + + if message.endswith(","): + message = message[:-1] + + if message_type == MessageType.PLAN: + message = beautify_plan_message(message) + message = "Plan:\n" + message + elif message_type == MessageType.STEP: + if "confirm" in message.lower(): + message = "Verify: " + message + else: + message = "Next step: " + message + elif message_type == MessageType.QUESTION: + message = "Question: " + message + elif message_type == MessageType.ANSWER: + message = "Response: " + message + safe_message = escape_js_message(message) - self.ui_manager.new_system_message(safe_message) - try: - js_code = f"addSystemMessage({safe_message}, false);" + self.ui_manager.new_system_message(safe_message, message_type) + + if self.ui_manager.overlay_show_details == False: # noqa: E712 + if message_type not in (MessageType.PLAN, MessageType.QUESTION, MessageType.ANSWER, MessageType.INFO): + return + if self.ui_manager.overlay_show_details == True: # noqa: E712 + if message_type not in (MessageType.PLAN, MessageType.QUESTION , MessageType.ANSWER, MessageType.INFO, MessageType.STEP): + return + + safe_message_type = escape_js_message(message_type.value) + try: + js_code = f"addSystemMessage({safe_message}, is_awaiting_user_response=false, message_type={safe_message_type});" page = await self.get_current_page() await page.evaluate(js_code) - logger.debug("User notification completed") except Exception as e: - logger.debug(f"Failed to notify user with message \"{message}\". However, most likey this will work itself out after the page loads: {e}") + logger.error(f"Failed to notify user with message \"{message}\". However, most likey this will work itself out after the page loads: {e}") async def highlight_element(self, selector: str, add_highlight: bool): try: page: Page = await self.get_current_page() if add_highlight: - # Add the 'pulsate' class to the element + # Add the 'agente-ui-automation-highlight' class to the element. This class is used to apply the fading border. await page.eval_on_selector(selector, '''e => { let originalBorderStyle = e.style.border; - e.classList.add('ui_automation_pulsate'); + e.classList.add('agente-ui-automation-highlight'); e.addEventListener('animationend', () => { - e.classList.remove('ui_automation_pulsate') + e.classList.remove('agente-ui-automation-highlight') });}''') logger.debug(f"Applied pulsating border to element with selector {selector} to indicate text entry operation") else: - # Remove the 'pulsate' class from the element - await page.eval_on_selector(selector, "e => e.classList.remove('ui_automation_pulsate')") + # Remove the 'agente-ui-automation-highlight' class from the element. + await page.eval_on_selector(selector, "e => e.classList.remove('agente-ui-automation-highlight')") logger.debug(f"Removed pulsating border from element with selector {selector} after text entry operation") except Exception: # This is not significant enough to fail the operation @@ -328,11 +361,11 @@ async def prompt_user(self, message: str) -> str: page = await self.get_current_page() await self.ui_manager.show_overlay(page) - self.log_system_message(message) # add the message to history after the overlay is opened to avoid double adding it. add_system_message below will add it + self.log_system_message(message, MessageType.QUESTION) # add the message to history after the overlay is opened to avoid double adding it. add_system_message below will add it safe_message = escape_js_message(message) - js_code = f"addSystemMessage({safe_message}, is_awaiting_user_response=true);" - print(">>> nofiy user about to exec JS code:", js_code) + + js_code = f"addSystemMessage({safe_message}, is_awaiting_user_response=true, message_type='question');" await page.evaluate(js_code) await self.user_response_event.wait() @@ -371,7 +404,7 @@ async def take_screenshots(self, name: str, page: Page|None, full_page: bool = T try: await page.wait_for_load_state(state=load_state, timeout=take_snapshot_timeout) # type: ignore await page.screenshot(path=screenshot_path, full_page=full_page, timeout=take_snapshot_timeout, caret="initial", scale="device") - print(f"Screen shot saved to: {screenshot_path}") + logger.debug(f"Screen shot saved to: {screenshot_path}") except Exception as e: logger.error(f"Failed to take screenshot and save to \"{screenshot_path}\". Error: {e}") @@ -386,15 +419,25 @@ def log_user_message(self, message: str): self.ui_manager.new_user_message(message) - def log_system_message(self, message: str): + def log_system_message(self, message: str, type: MessageType = MessageType.STEP): """ Log a system message. Args: message (str): The system message to log. """ - self.ui_manager.new_system_message(message) + self.ui_manager.new_system_message(message, type) + + async def update_processing_state(self, processing_state: str): + """ + Update the processing state of the overlay. + + Args: + is_processing (str): "init", "processing", "done" + """ + page = await self.get_current_page() + await self.ui_manager.update_processing_state(processing_state, page) async def command_completed(self, command: str, elapsed_time: float | None = None): """ diff --git a/ae/core/post_process_responses.py b/ae/core/post_process_responses.py index 770af14..1a7e492 100644 --- a/ae/core/post_process_responses.py +++ b/ae/core/post_process_responses.py @@ -1,9 +1,11 @@ +import asyncio from typing import Any import autogen # type: ignore from ae.core.playwright_manager import PlaywrightManager from ae.utils.logger import logger +from ae.utils.ui_messagetype import MessageType def final_reply_callback_user_proxy(recipient: autogen.ConversableAgent, messages: list[dict[str, Any]], sender: autogen.Agent, config: dict[str, Any]): @@ -22,7 +24,6 @@ def final_reply_callback_user_proxy(recipient: autogen.ConversableAgent, message Tuple[bool, None]: A tuple indicating whether the processing should stop and the response to be sent. """ global last_agent_response - last_message = messages[-1] logger.debug(f"Post Process Message (User Proxy):{last_message}") if last_message.get('content') and "##TERMINATE##" in last_message['content']: @@ -35,36 +36,8 @@ def final_reply_callback_user_proxy(recipient: autogen.ConversableAgent, message return False, None - - - -async def final_reply_callback_browser_agent(recipient: autogen.ConversableAgent, messages: list[dict[str, Any]], sender: autogen.Agent, config: dict[str, Any]): - """ - Callback function that is called each time the browser agent receives a message. - It picks the last message from the list of messages and checks if it contains the termination signal. - If the termination signal is found, it extracts the final response and outputs it. - - Args: - recipient (autogen.ConversableAgent): The recipient of the message. - messages (Optional[list[dict[str, Any]]]): The list of messages received by the agent. - sender (Optional[autogen.Agent]): The sender of the message. - config (Optional[Any]): Additional configuration parameters. - - Returns: - Tuple[bool, None]: A tuple indicating whether the processing should stop and the response to be sent. - """ - global last_agent_response - - last_message = messages[-1] - print(f"Post Process Message (Browser Agent):{last_message}") - if last_message.get('content') and "##TERMINATE##" in last_message['content']: - last_agent_response = last_message['content'].replace("##TERMINATE##", "").strip() - if last_agent_response: - browser_manager = PlaywrightManager(browser_type='chromium', headless=False) - await browser_manager.notify_user(last_agent_response) - logger.debug("*****Final Reply*****") - logger.debug(f"Final Response: {last_agent_response}") - logger.debug("*********************") - return True, None - - return False, None +def final_reply_callback_planner_agent(message:str, message_type:MessageType = MessageType.STEP): # type: ignore + browser_manager = PlaywrightManager(browser_type='chromium', headless=False) + loop = asyncio.get_event_loop() + loop.run_until_complete(browser_manager.notify_user(message, message_type=message_type)) + return False, None # required to ensure the agent communication flow continues diff --git a/ae/core/prompts.py b/ae/core/prompts.py index fcf7261..bda54ed 100644 --- a/ae/core/prompts.py +++ b/ae/core/prompts.py @@ -1,93 +1,203 @@ LLM_PROMPTS = { - "USER_AGENT_PROMPT": """A proxy for the user for executing the user commands.""", - - "BROWSER_AGENT_PROMPT": """You will perform web navigation tasks, which may include logging into websites. - Use the provided JSON DOM representation for element location or text summarization. - Interact with pages using only the "mmid" attribute in DOM elements. - You must extract mmid value from the fetched DOM, do not conjure it up. - For additional user input, request it directly. - Execute actions sequentially to avoid navigation timing issues. Once a task is completed, confirm completion with ##TERMINATE##. - The given functions are NOT parallelizable. They are intended for sequential execution. - If you need to call multiple functions in a task step, call one function at a time. Wait for the function's response before invoking the next function. This is important to avoid collision. - Some of the provided functions do provide bulk operations, for those, the function description will clearly mention it. - For information seeking tasks where a text response is expected, the returned answer should answer the question as directly as possible and should be followed by ##TERMINATE##. - If your approach fails try again with a different approach in hopes of a better outcome, but don't do this endlessly. - Ensure that user questions are answered from the DOM and not from memory or assumptions. - Since your knowledge can be outdated, if a URL that you provide is not found, use a different approach to find the correct website to navigate to. - Do not solicit further user requests. If user response is lacking, terminate the conversation with ##TERMINATE##.$basic_user_information""", - - "ENTER_TEXT_AND_CLICK_PROMPT": """This skill enters text into a specified element and clicks another element, both identified by their DOM selector queries. - Ideal for seamless actions like submitting search queries, this integrated approach ensures superior performance over separate text entry and click commands. - Successfully completes when both actions are executed without errors, returning True; otherwise, it provides False or an explanatory message of any failure encountered. - Always prefer this dual-action skill for tasks that combine text input and element clicking to leverage its streamlined operation.""", - - "OPEN_URL_PROMPT": """Opens a specified URL in the web browser instance. Returns url of the new page if successful or appropriate error message if the page could not be opened.""", - - "COMMAND_EXECUTION_PROMPT": """Execute the user task "$command" using the appropriate agent. $current_url_prompt_segment""", - - "GET_USER_INPUT_PROMPT": """Get clarification from the user or wait for user to perform an action on webpage. This is useful e.g. when you encounter a login or captcha and requires the user to intervene. This skill will also be useful when task is ambigious and you need more clarification from the user (e.g. ["which source website to use to accomplish a task"], ["Enter your credentials on your webpage and type done to continue"]). Use this skill sparingly and only when absolutely needed.""", - - "GET_DOM_WITHOUT_CONTENT_TYPE_PROMPT": """Retrieves the DOM of the current web browser page. - Each DOM element will have an \"mmid\" attribute injected for ease of DOM interaction. - Returns a minified representation of the HTML DOM where each HTML DOM Element has an attribute called \"mmid\" for ease of DOM query selection. When \"mmid\" attribute is available, use it for DOM query selectors.""", - - # This one below had all three content types including input_fields - "GET_DOM_WITH_CONTENT_TYPE_PROMPT": """Retrieves the DOM of the current web site based on the given content type. - The DOM representation returned contains items ordered in the same way they appear on the page. Keep this in mind when executing user requests that contain ordinals or numbered items. - Here is an explanation of the content_types: - text_only - returns plain text representing all the text in the web site - input_fields - returns a JSON string containing a list of objects representing input html elements and their attributes with mmid attribute in every element - all_fields - returns a JSON string containing a list of objects representing ALL html elements and their attributes with mmid attribute in every element - 'input_fields' is most suitable to retrieve input fields from the DOM for example a search field or a button to press.""", - - "GET_ACCESSIBILITY_TREE": """Retrieves the accessibility tree of the current web site. - The DOM representation returned contains items ordered in the same way they appear on the page. Keep this in mind when executing user requests that contain ordinals or numbered items.""", - - "CLICK_PROMPT": """Executes a click action on the element matching the given mmid attribute value. It is best to use mmid attribute as the selector. - Returns Success if click was successful or appropriate error message if the element could not be clicked.""", - - "CLICK_PROMPT_ACCESSIBILITY": """Executes a click action on the element a name and role. - Returns Success if click was successful or appropriate error message if the element could not be clicked.""", - - "GET_URL_PROMPT": """Get the full URL of the current web page/site. If the user command seems to imply an action that would be suitable for an already open website in their browser, use this to fetch current website URL.""", - - "ENTER_TEXT_PROMPT": """Single enter given text in the DOM element matching the given mmid attribute value. This will only enter the text and not press enter or anything else. - Returns Success if text entry was successful or appropriate error message if text could not be entered.""", - - "BULK_ENTER_TEXT_PROMPT": """Bulk enter text in multiple DOM fields. To be used when there are multiple fields to be filled on the same page. - Enters text in the DOM elements matching the given mmid attribute value. - The input will receive a list of objects containing the DOM query selector and the text to enter. - This will only enter the text and not press enter or anything else. - Returns each selector and the result for attempting to enter text.""", - - "PRESS_KEY_COMBINATION_PROMPT": """Presses the given key combination on the current web page. - This is useful for keycombinations or even just pressing the enter button to submit a search query.""", - - "PRESS_ENTER_KEY_PROMPT": """Presses the enter key in the given html field. This is most useful on text input fields.""", - - "EXTRACT_TEXT_FROM_PDF_PROMPT": """Extracts text from a PDF file hosted at the given URL.""", - - "BROWSER_AGENT_NO_SKILLS_PROMPT": """You are an autonomous agent tasked with performing web navigation on a Playwright instance, including logging into websites and executing other web-based actions. - You will receive user commands, formulate a plan and then write the PYTHON code that is needed for the task to be completed. - It is possible that the code you are writing is for one step at a time in the plan. This will ensure proper execution of the task. - Your operations must be precise and efficient, adhering to the guidelines provided below: - 1. **Asynchronous Code Execution**: Your tasks will often be asynchronous in nature, requiring careful handling. Wrap asynchronous operations within an appropriate async structure to ensure smooth execution. - 2. **Sequential Task Execution**: To avoid issues related to navigation timing, execute your actions in a sequential order. This method ensures that each step is completed before the next one begins, maintaining the integrity of your workflow. Some steps like navigating to a site will require a small amount of wait time after them to ensure they load correctly. - 3. **Error Handling and Debugging**: Implement error handling to manage exceptions gracefully. Should an error occur or if the task doesn't complete as expected, review your code, adjust as necessary, and retry. Use the console or logging for debugging purposes to track the progress and issues. - 4. **Using HTML DOM**: Do not assume what a DOM selector (web elements) might be. Rather, fetch the DOM to look for the selectors or fetch DOM inner text to answer a questions. This is crucial for accurate task execution. When you fetch the DOM, reason about its content to determine appropriate selectors or text that should be extracted. To fetch the DOM using playwright you can: - - Fetch entire DOM using page.content() method. In the fetched DOM, consider if appropriate to remove entire sections of the DOM like `script`, `link` elements - - Fetch DOM inner text only text_content = await page.evaluate("() => document.body.innerText || document.documentElement.innerText"). This is useful for information retrieval. - 5. **DOM Handling**: Never ever substring the extracted HTML DOM. You can remove entire sections/elements of the DOM like `script`, `link` elements if they are not needed for the task. This is crucial for accurate task execution. - 6. **Execution Verification**: After executing the user the given code, ensure that you verify the completion of the task. If the task is not completed, revise your plan then rewrite the code for that step. - 7. **Termination Protocol**: Once a task is verified as complete or if it's determined that further attempts are unlikely to succeed, conclude the operation and respond with `##TERMINATE##`, to indicate the end of the session. This signal should only be used when the task is fully completed or if there's a consensus that continuation is futile. - 8. **Code Modification and Retry Strategy**: If your initial code doesn't achieve the desired outcome, revise your approach based on the insights gained during the process. When DOM selectors you are using fail, fetch the DOM and reason about it to discover the right selectors.If there are timeouts, adjust increase times. Add other error handling mechanisms before retrying as needed. - 9. **Code Generation**: Generated code does not need documentation or usage examples. Assume that it is being executed by an autonomous agent acting on behalf of the user. Do not add placeholders in the code. - 10. **Browser Handling**: Do not user headless mode with playwright. Do not close the browser after every step or even after task completion. Leave it open. - 11. **Reponse**: Remember that you are communicating with an autonomous agent that does not reason. All it does is execute code. Only respond with code that it can execute unless you are terminating. - 12. **Playwrite Oddities**: There are certain things that Playwright does not do well: - - page.wait_for_selector: When providing a timeout value, it will almost always timeout. Put that call in a try/except block and catch the timeout. If timeout occurs just move to the next statement in the code and most likely it will work. For example, if next statement is page.fill, just execute it. - - By following these guidelines, you will enhance the efficiency, reliability, and user interaction of your web navigation tasks. - Always aim for clear, concise, and well-structured code that aligns with best practices in asynchronous programming and web automation. - """, + "USER_AGENT_PROMPT": """A proxy for the user for executing the user commands.""", + "BROWSER_NAV_EXECUTOR_PROMPT": """A proxy for the user for executing the user commands.""", + + "PLANNER_AGENT_PROMPT": """You are a web automation task planner. You will receive tasks from the user and will work with a naive helper to accomplish it. + When the task is ambigious, use the get_user_input skill to ask the user for more information. You will not make any assumptions. +You will think step by step and break down the tasks into sequence of simple subtasks. Subtasks will be delegated to the helper to execute. + +Return Format: +Your reply will strictly be a well-fromatted JSON with four attributes. +"plan": This contains the high-level plan. This is optional and needs to be present only when a task starts and when the plan needs to be revised. +"next_step": A detailed next step consistent with the plan. The next step will be delegated to the helper to execute. This needs to be present for every response except when terminating +"terminate": yes/no. Return yes when the exact task is complete without any compromises or you are absolutely convinced that the task cannot be completed, no otherwise. This is mandatory for every response. +"final_response": This is the final answer that will be returned to the user. In search tasks, unless explicitly stated, you will provide the single best suited result in the response instead of listing multiple options. This attribute only needs to be present when terminate is true. + +Capabilities and limitation of the helper: +1. Helper can navigate to urls, perform simple interactions on a page or answer any question you may have about the current page. +2. Helper cannot perform complex planning, reasoning or analysis. You will not delegate any such tasks to helper, instead you will perform them based on information from the helper. +3. Helper is stateless and treats each step as a new task. Helper will not remember previous pages or actions. So, you will provide all necessary information as part of each step. +4. Very Important: Helper cannot go back to previous pages. If you need the helper to return to a previous page, you must explicitly add the URL of the previous page in the step (e.g. return to the search result page by navigating to the url https://www.google.com/search?q=Finland") + +Guidelines: + +1. If you know a URL, you can provide it to the helper to navigate to a new page (e.g. go to www.espn.com). +2. Do not assume any capability exists on the webpage. Ask questions to the helper to confirm the presence of features (e.g. is there a sort by price feature available on the page?). This will help you revise the plan as needed and also establish common ground with the helper. +3. Do not combine multiple steps into one. A step should be strictly as simple as interacting with a single element or navigating to a page. If you need to interact with multiple elements or perform multiple actions, you will break it down into multiple steps. +4. Important: You will NOT ask for any URLs of hyperlinks in the page from the helper, instead you will simply ask the helper to click on specific result. URL of the current page will be automatically provided to you with each helper response. +5. Very Important: Add verification as part of the plan, after each step and specifically before terminating to ensure that the task is completed successfully. Ask simple questions to verify the step completion (e.g. Can you confirm that White Nothing Phone 2 with 16GB RAM is present in the cart?). Do not assume the helper has performed the task correctly. +6. If the task requires multiple informations, all of them are equally important and should be gathered before terminating the task. You will strive to meet all the requirements of the task. +7. If one plan fails, you MUST revise the plan and try a different approach. You will NOT terminate a task untill you are absolutely convinced that the task is impossible to accomplish. + +Complexities of web navigation: +1. Many forms have mandatory fields that need to be filled up before they can be submitted. Ask the helper for what fields look mandatory. +2. In many websites, there are multiple options to filter or sort results. Ask the helper to list any elements on the page which will help the task (e.g. are there any links or interactive elements that may lead me to the support page?). +3. Always keep in mind complexities such as filtering, advanced search, sorting, and other features that may be present on the website. Ask the helper whether these features are available on the page when relevant and use them when the task requires it. +4. Very often list of items such as, search results, list of products, list of reviews, list of people etc. may be divided into multiple pages. If you need complete information, it is critical to explicitly ask the helper to go through all the pages. +5. Sometimes search capabilities available on the page will not yield the optimal results. Revise the search query to either more specific or more generic. +6. When a page refreshes or navigates to a new page, information entered in the previous page may be lost. Check that the information needs to be re-entered (e.g. what are the values in source and destination on the page?). +7. Sometimes some elements may not be visible or be disabled until some other action is performed. Ask the helper to confirm if there are any other fields that may need to be interacted for elements to appear or be enabled. + +Example 1: +Task: Find the cheapest store to buy Nothing Phone 2 (128GB). Current Page:www.google.com +Your Reply: +{"plan": "1. Search for "Buy Nothing Phone 2 (128Gb)" on Google. +2. Confirm that you are on the google search results page for "Buy Nothing Phone 2 (128GB)". +3. List the titles of all the search results from the current google search results page. +4. Click on the first link titled from the search results page +5. Confirm that you are on the Nothing phone 2 (128Gb) product page of the online store <name>. +6. Extract the price and availability of the Nothing Phone 2 (128GB) from the current product page. +7. Return to google search results page by navigating to the url https://www.google.com/search?q=Buy+Nothing+Phone+2+(128GB). +8. Confirm that you are on the google search results page for "Buy Nothing Phone 2 (128GB)". +9. Click on the second link titled <title> from the search results page +10. Continue untill you have extracted the availability, and price of Nothing Phone 2 (128GB) from all the online stores listed on the page. +"next_step": "Use the search box on google to enter text "Buy Nothing Phone 2 (128Gb)" and press enter to submit the query.", +"terminate":"no"} + +After the task is completed and when terminating: +Your reply: {"terminate":"yes", "final_response": "Here is the Nothing phone 2 price list: <price list>. The cheapest store is <store name> with price <price>."} + +Example 2: +Task: Find the cheapest premium economy flights from Helsinki to Stockholm on 15 March. Current page: www.skyscanner.com +{"plan":"1. List the interaction options available on skyscanner page relevant for flight reservation along with their default values. +2. Select the journey option to one-way (if not default). +3. Set number of passengers to 1 (if not default). +4. Set the departure date to 15 March 2025 (since 15 March 2024 is already past). +5. Set ticket type to Economy Premium. +5. Set from airport to ""Helsinki". +6. Set destination airport to Stockhokm +7. Confirm that current values in the source airport, destination airport and departure date fields are Helsinki, Stockholm and 15 August 2024 respectively. +8. Click on the search button to get the search results. +9. Confirm that you are on the search results page. +10. Extract the price of the cheapest flight from Helsinki to Stokchol from the search results.", +"next_step": "List all interaction options available on this skyscanner page relevant for flight reservation. This could be source airport, destination aiport etc. Also provide the current default values of the fields.", +"terminate":"no"}, +Notice above how there is confirmation after each step and how interaction (e.g. setting source and destination) with each element is a seperate step. Follow same pattern. + +Remember: you are a very very persistent planner who will try every possible strategy to accomplish the task perfectly. +Revise search query if needed, ask for more information if needed, and always verify the results before terminating the task. +Some basic information about the user: $basic_user_information""", + + "BROWSER_AGENT_PROMPT": """You will perform web navigation tasks, which may include logging into websites and interacting with any web content using the functions made available to you. + Use the provided DOM representation for element location or text summarization. + Interact with pages using only the "mmid" attribute in DOM elements. + You must extract mmid value from the fetched DOM, do not conjure it up. + Execute function sequentially to avoid navigation timing issues. Once a task is completed, confirm completion with ##TERMINATE TASK##. + The given actions are NOT parallelizable. They are intended for sequential execution. + If you need to call multiple functions in a task step, call one function at a time. Wait for the function's response before invoking the next function. This is important to avoid collision. + Strictly for search fields, submit the field by pressing Enter key. For other forms, click on the submit button. + Unless otherwise specified, the task must be performed on the current page. Use openurl only when explicitly instructed to navigate to a new page with a url specified. If you do not know the URL ask for it. + You will NOT provide any URLs of links on webpage. If user asks for URLs, you will instead provide the text of the hyperlink on the page and offer to click on it. This is very very important. + When inputing information, remember to follow the format of the input field. For example, if the input field is a date field, you will enter the date in the correct format (e.g. YYYY-MM-DD), you may get clues from the placeholder text in the input field. + if the task is ambigous or there are multiple options to choose from, you will ask the user for clarification. You will not make any assumptions. + Individual function will reply with action success and if any changes were observed as a consequence. Adjust your approach based on this feedback. + Once the task is completed or cannot be completed, return a short summary of the actions you performed to accomplish the task, and what worked and what did not. This should be followed by ##TERMINATE TASK##. Your reply will not contain any other information. + Additionally, If task requires an answer, you will also provide a short and precise answer followed by ##TERMINATE TASK##. + Ensure that user questions are answered from the DOM and not from memory or assumptions. To answer a question about textual information on the page, prefer to use text_only DOM type. To answer a question about interactive elements, use all_fields DOM type. + Do not provide any mmid values in your response. + Important: If you encounter an issues or is unsure how to proceed, simply ##TERMINATE TASK## and provide a detailed summary of the exact issue encountered. + Do not repeat the same action multiple times if it fails. Instead, if something did not work after a few attempts, terminate the task.""", + + + "VERFICATION_AGENT": """Given a conversation and a task, your task is to analyse the conversation and tell if the task is completed. If not, you need to tell what is not completed and suggest next steps to complete the task.""", + "ENTER_TEXT_AND_CLICK_PROMPT": """This skill enters text into a specified element and clicks another element, both identified by their DOM selector queries. + Ideal for seamless actions like submitting search queries, this integrated approach ensures superior performance over separate text entry and click commands. + Successfully completes when both actions are executed without errors, returning True; otherwise, it provides False or an explanatory message of any failure encountered. + Always prefer this dual-action skill for tasks that combine text input and element clicking to leverage its streamlined operation.""", + + + "OPEN_URL_PROMPT": """Opens a specified URL in the web browser instance. Returns url of the new page if successful or appropriate error message if the page could not be opened.""", + + + "GO_BACK_PROMPT": """Goes back to previous page in the browser history. Useful when correcting an incorrect action that led to a new page or when needing to revisit a previous page for information. Returns the full URL of the page after the back action is performed.""", + + + "COMMAND_EXECUTION_PROMPT": """Execute the user task "$command" $current_url_prompt_segment""", + + + "GET_USER_INPUT_PROMPT": """Get clarification by asking the user or wait for user to perform an action on webpage. This is useful e.g. when you encounter a login or captcha and requires the user to intervene. This skill will also be useful when task is ambigious and you need more clarification from the user (e.g. ["which source website to use to accomplish a task"], ["Enter your credentials on your webpage and type done to continue"]). Use this skill very sparingly and only when absolutely needed.""", + + + "GET_DOM_WITHOUT_CONTENT_TYPE_PROMPT": """Retrieves the DOM of the current web browser page. + Each DOM element will have an \"mmid\" attribute injected for ease of DOM interaction. + Returns a minified representation of the HTML DOM where each HTML DOM Element has an attribute called \"mmid\" for ease of DOM query selection. When \"mmid\" attribute is available, use it for DOM query selectors.""", + + + # This one below had all three content types including input_fields + "GET_DOM_WITH_CONTENT_TYPE_PROMPT": """Retrieves the DOM of the current web site based on the given content type. + The DOM representation returned contains items ordered in the same way they appear on the page. Keep this in mind when executing user requests that contain ordinals or numbered items. + text_only - returns plain text representing all the text in the web site. Use this for any information retrieval task. This will contain the most complete textual information. + input_fields - returns a JSON string containing a list of objects representing text input html elements with mmid attribute. Use this strictly for interaction purposes with text input fields. + all_fields - returns a JSON string containing a list of objects representing all interactive elements and their attributes with mmid attribute. Use this strictly to identify and interact with any type of elements on page. + If information is not available in one content type, you must try another content_type.""", + + + "GET_ACCESSIBILITY_TREE": """Retrieves the accessibility tree of the current web site. + The DOM representation returned contains items ordered in the same way they appear on the page. Keep this in mind when executing user requests that contain ordinals or numbered items.""", + + + "CLICK_PROMPT": """Executes a click action on the element matching the given mmid attribute value. It is best to use mmid attribute as the selector. + Returns Success if click was successful or appropriate error message if the element could not be clicked.""", + + + "CLICK_PROMPT_ACCESSIBILITY": """Executes a click action on the element a name and role. + Returns Success if click was successful or appropriate error message if the element could not be clicked.""", + + + "GET_URL_PROMPT": """Get the full URL of the current web page/site. If the user command seems to imply an action that would be suitable for an already open website in their browser, use this to fetch current website URL.""", + + + "ENTER_TEXT_PROMPT": """Single enter given text in the DOM element matching the given mmid attribute value. This will only enter the text and not press enter or anything else. + Returns Success if text entry was successful or appropriate error message if text could not be entered.""", + + + "CLICK_BY_TEXT_PROMPT": """Executes a click action on the element matching the text. If multiple text matches are found, it will click on all of them. Use this as last resort when all else fails.""", + + "BULK_ENTER_TEXT_PROMPT": """Bulk enter text in multiple DOM fields. To be used when there are multiple fields to be filled on the same page. + Enters text in the DOM elements matching the given mmid attribute value. + The input will receive a list of objects containing the DOM query selector and the text to enter. + This will only enter the text and not press enter or anything else. + Returns each selector and the result for attempting to enter text.""", + + + "PRESS_KEY_COMBINATION_PROMPT": """Presses the given key on the current web page. + This is useful for pressing the enter button to submit a search query, PageDown to scroll, ArrowDown to change selection in a focussed list etc.""", + + + "ADD_TO_MEMORY_PROMPT": """"Save any information that you may need later in this term memory. This could be useful for saving things to do, saving information for personalisation, or even saving information you may need in future for efficiency purposes E.g. Remember to call John at 5pm, This user likes Tesla company and considered buying shares, The user enrollment form is available in <url> etc.""", + + "HOVER_PROMPT": """Hover on a element with the given mmid attribute value. Hovering on an element can reveal additional information such as a tooltip or trigger a dropdown menu with different navigation options.""", + "GET_MEMORY_PROMPT": """Retrieve all the information previously stored in the memory""", + + + "PRESS_ENTER_KEY_PROMPT": """Presses the enter key in the given html field. This is most useful on text input fields.""", + + + "EXTRACT_TEXT_FROM_PDF_PROMPT": """Extracts text from a PDF file hosted at the given URL.""", + + + "BROWSER_AGENT_NO_SKILLS_PROMPT": """You are an autonomous agent tasked with performing web navigation on a Playwright instance, including logging into websites and executing other web-based actions. + You will receive user commands, formulate a plan and then write the PYTHON code that is needed for the task to be completed. + It is possible that the code you are writing is for one step at a time in the plan. This will ensure proper execution of the task. + Your operations must be precise and efficient, adhering to the guidelines provided below: + 1. **Asynchronous Code Execution**: Your tasks will often be asynchronous in nature, requiring careful handling. Wrap asynchronous operations within an appropriate async structure to ensure smooth execution. + 2. **Sequential Task Execution**: To avoid issues related to navigation timing, execute your actions in a sequential order. This method ensures that each step is completed before the next one begins, maintaining the integrity of your workflow. Some steps like navigating to a site will require a small amount of wait time after them to ensure they load correctly. + 3. **Error Handling and Debugging**: Implement error handling to manage exceptions gracefully. Should an error occur or if the task doesn't complete as expected, review your code, adjust as necessary, and retry. Use the console or logging for debugging purposes to track the progress and issues. + 4. **Using HTML DOM**: Do not assume what a DOM selector (web elements) might be. Rather, fetch the DOM to look for the selectors or fetch DOM inner text to answer a questions. This is crucial for accurate task execution. When you fetch the DOM, reason about its content to determine appropriate selectors or text that should be extracted. To fetch the DOM using playwright you can: + - Fetch entire DOM using page.content() method. In the fetched DOM, consider if appropriate to remove entire sections of the DOM like `script`, `link` elements + - Fetch DOM inner text only text_content = await page.evaluate("() => document.body.innerText || document.documentElement.innerText"). This is useful for information retrieval. + 5. **DOM Handling**: Never ever substring the extracted HTML DOM. You can remove entire sections/elements of the DOM like `script`, `link` elements if they are not needed for the task. This is crucial for accurate task execution. + 6. **Execution Verification**: After executing the user the given code, ensure that you verify the completion of the task. If the task is not completed, revise your plan then rewrite the code for that step. + 7. **Termination Protocol**: Once a task is verified as complete or if it's determined that further attempts are unlikely to succeed, conclude the operation and respond with `##TERMINATE##`, to indicate the end of the session. This signal should only be used when the task is fully completed or if there's a consensus that continuation is futile. + 8. **Code Modification and Retry Strategy**: If your initial code doesn't achieve the desired outcome, revise your approach based on the insights gained during the process. When DOM selectors you are using fail, fetch the DOM and reason about it to discover the right selectors.If there are timeouts, adjust increase times. Add other error handling mechanisms before retrying as needed. + 9. **Code Generation**: Generated code does not need documentation or usage examples. Assume that it is being executed by an autonomous agent acting on behalf of the user. Do not add placeholders in the code. + 10. **Browser Handling**: Do not user headless mode with playwright. Do not close the browser after every step or even after task completion. Leave it open. + 11. **Reponse**: Remember that you are communicating with an autonomous agent that does not reason. All it does is execute code. Only respond with code that it can execute unless you are terminating. + 12. **Playwrite Oddities**: There are certain things that Playwright does not do well: + - page.wait_for_selector: When providing a timeout value, it will almost always timeout. Put that call in a try/except block and catch the timeout. If timeout occurs just move to the next statement in the code and most likely it will work. For example, if next statement is page.fill, just execute it. + + + By following these guidelines, you will enhance the efficiency, reliability, and user interaction of your web navigation tasks. + Always aim for clear, concise, and well-structured code that aligns with best practices in asynchronous programming and web automation. + """, } diff --git a/ae/core/skills/__init__.py b/ae/core/skills/__init__.py index 9931cf7..4ecf9f1 100644 --- a/ae/core/skills/__init__.py +++ b/ae/core/skills/__init__.py @@ -15,6 +15,4 @@ from ae.core.skills.get_user_input import get_user_input from ae.core.skills.open_url import openurl -from ae.core.skills.press_key_combination import do_press_key_combination -from ae.core.skills.press_key_combination import press_enter_key from ae.core.skills.press_key_combination import press_key_combination \ No newline at end of file diff --git a/ae/core/skills/click_using_selector.py b/ae/core/skills/click_using_selector.py index 887f3ac..dfbf4e5 100644 --- a/ae/core/skills/click_using_selector.py +++ b/ae/core/skills/click_using_selector.py @@ -8,12 +8,13 @@ from ae.core.playwright_manager import PlaywrightManager from ae.utils.dom_helper import get_element_outer_html -from ae.utils.dom_mutation_observer import subscribe -from ae.utils.dom_mutation_observer import unsubscribe +from ae.utils.dom_mutation_observer import subscribe # type: ignore +from ae.utils.dom_mutation_observer import unsubscribe # type: ignore from ae.utils.logger import logger +from ae.utils.ui_messagetype import MessageType -async def click(selector: Annotated[str, "The properly formed query selector string to identify the element for the click action. When \"mmid\" attribute is present, use it for the query selector."], +async def click(selector: Annotated[str, "The properly formed query selector string to identify the element for the click action (e.g. [mmid='114']). When \"mmid\" attribute is present, use it for the query selector."], wait_before_execution: Annotated[float, "Optional wait time in seconds before executing the click event logic.", float] = 0.0) -> Annotated[str, "A message indicating success or failure of the click."]: """ Executes a click action on the element matching the given query selector string within the currently open web page. @@ -35,7 +36,7 @@ async def click(selector: Annotated[str, "The properly formed query selector str if page is None: # type: ignore raise ValueError('No active page found. OpenURL command opens a new page.') - function_name = inspect.currentframe().f_code.co_name + function_name = inspect.currentframe().f_code.co_name # type: ignore await browser_manager.take_screenshots(f"{function_name}_start", page) @@ -51,15 +52,10 @@ def detect_dom_changes(changes:str): # type: ignore await asyncio.sleep(0.1) # sleep for 100ms to allow the mutation observer to detect changes unsubscribe(detect_dom_changes) await browser_manager.take_screenshots(f"{function_name}_end", page) - await browser_manager.notify_user(result["summary_message"]) + await browser_manager.notify_user(result["summary_message"], message_type=MessageType.ACTION) if dom_changes_detected: - return f"{result['detailed_message']}.\n As a consequence of this action, new elements have appeared in view: {dom_changes_detected}. Get all_fields to interact with the elements." - return result["detailed_message"] - - - result = await do_click(page, selector, wait_before_execution) - await browser_manager.notify_user(result["summary_message"]) + return f"Success: {result['summary_message']}.\n As a consequence of this action, new elements have appeared in view: {dom_changes_detected}. This means that the action to click {selector} is not yet executed and needs further interaction. Get all_fields DOM to complete the interaction." return result["detailed_message"] @@ -109,7 +105,7 @@ async def do_click(page: Page, selector: str, wait_before_execution: float) -> d element_tag_name = await element.evaluate("element => element.tagName.toLowerCase()") element_outer_html = await get_element_outer_html(element, page, element_tag_name) - + if element_tag_name == "option": element_value = await element.get_attribute("value") # get the text that is in the value of the option @@ -118,16 +114,15 @@ async def do_click(page: Page, selector: str, wait_before_execution: float) -> d await parent_element.select_option(value=element_value) # type: ignore logger.info(f'Select menu option "{element_value}" selected') - - return {"summary_message": f'Select menu option "{element_value}" selected', + + return {"summary_message": f'Select menu option "{element_value}" selected', "detailed_message": f'Select menu option "{element_value}" selected. The select element\'s outer HTML is: {element_outer_html}.'} - - await element.focus() + + #Playwright click seems to fail more often than not, disabling it for now and just going with JS click #await perform_playwright_click(element, selector) - await perform_javascript_click(page, selector) - msg = f"Element with selector: \"{selector}\" clicked." - return {"summary_message": msg, "detailed_message": f"{msg} The clicked element's outer HTML is: {element_outer_html}."} + msg = await perform_javascript_click(page, selector) + return {"summary_message": msg, "detailed_message": f"{msg} The clicked element's outer HTML is: {element_outer_html}."} # type: ignore except Exception as e: logger.error(f"Unable to click element with selector: \"{selector}\". Error: {e}") traceback.print_exc() @@ -202,7 +197,12 @@ async def perform_javascript_click(page: Page, selector: str): if (element.tagName.toLowerCase() === "a") { element.target = "_self"; } + let ariaExpandedBeforeClick = element.getAttribute('aria-expanded'); element.click(); + let ariaExpandedAfterClick = element.getAttribute('aria-expanded'); + if (ariaExpandedBeforeClick === 'false' && ariaExpandedAfterClick === 'true') { + return "Executed JavaScript Click on element with selector: "+selector +". Very important: As a consequence a menu has appeared where you may need to make further selction. Very important: Get all_fields DOM to complete the action."; + } return "Executed JavaScript Click on element with selector: "+selector; } }""" diff --git a/ae/core/skills/enter_text_and_click.py b/ae/core/skills/enter_text_and_click.py index 9996416..dd2a926 100644 --- a/ae/core/skills/enter_text_and_click.py +++ b/ae/core/skills/enter_text_and_click.py @@ -7,6 +7,7 @@ from ae.core.skills.enter_text_using_selector import do_entertext from ae.core.skills.press_key_combination import do_press_key_combination from ae.utils.logger import logger +from ae.utils.ui_messagetype import MessageType async def enter_text_and_click( @@ -46,12 +47,12 @@ async def enter_text_and_click( await browser_manager.highlight_element(text_selector, True) - function_name = inspect.currentframe().f_code.co_name + function_name = inspect.currentframe().f_code.co_name # type: ignore await browser_manager.take_screenshots(f"{function_name}_start", page) text_entry_result = await do_entertext(page, text_selector, text_to_enter, use_keyboard_fill=True) - await browser_manager.notify_user(text_entry_result["summary_message"]) + #await browser_manager.notify_user(text_entry_result["summary_message"]) if not text_entry_result["summary_message"].startswith("Success"): await browser_manager.take_screenshots(f"{function_name}_end", page) return(f"Failed to enter text '{text_to_enter}' into element with selector '{text_selector}'. Check that the selctor is valid.") @@ -63,16 +64,16 @@ async def enter_text_and_click( do_press_key_combination_result = await do_press_key_combination(browser_manager, page, "Enter") if do_press_key_combination_result: result["detailed_message"] += f" Instead of click, pressed the Enter key successfully on element: \"{click_selector}\"." - await browser_manager.notify_user(f"Pressed the Enter key successfully on element: \"{click_selector}\".") + await browser_manager.notify_user(f"Pressed the Enter key successfully on element: \"{click_selector}\".", message_type=MessageType.ACTION) else: result["detailed_message"] += f" Clicking the same element after entering text in it, is of no value. Tried pressing the Enter key on element \"{click_selector}\" instead of click and failed." - await browser_manager.notify_user("Failed to press the Enter key on element \"{click_selector}\".") + await browser_manager.notify_user("Failed to press the Enter key on element \"{click_selector}\".", message_type=MessageType.ACTION) else: await browser_manager.highlight_element(click_selector, True) do_click_result = await do_click(page, click_selector, wait_before_click_execution) result["detailed_message"] += f' {do_click_result["detailed_message"]}' - await browser_manager.notify_user(do_click_result["summary_message"]) + #await browser_manager.notify_user(do_click_result["summary_message"]) await asyncio.sleep(0.1) # sleep for 100ms to allow the mutation observer to detect changes diff --git a/ae/core/skills/enter_text_using_selector.py b/ae/core/skills/enter_text_using_selector.py index fe369bb..0491092 100644 --- a/ae/core/skills/enter_text_using_selector.py +++ b/ae/core/skills/enter_text_using_selector.py @@ -13,6 +13,7 @@ from ae.utils.dom_mutation_observer import subscribe from ae.utils.dom_mutation_observer import unsubscribe from ae.utils.logger import logger +from ae.utils.ui_messagetype import MessageType @dataclass @@ -62,11 +63,12 @@ async def custom_fill_element(page: Page, selector: str, text_to_enter: str): selector = f"{selector}" # Ensures the selector is treated as a string await page.evaluate("""(inputParams) => { const selector = inputParams.selector; - const text_to_enter = inputParams.text_to_enter; + let text_to_enter = inputParams.text_to_enter; + text_to_enter = text_to_enter.trim(); document.querySelector(selector).value = text_to_enter; }""", {"selector": selector, "text_to_enter": text_to_enter}) -async def entertext(entry: Annotated[EnterTextEntry, "An object containing 'query_selector' (DOM selector query using mmid attribute) and 'text' (text to enter on the element)."]) -> Annotated[str, "Explanation of the outcome of this operation."]: +async def entertext(entry: Annotated[EnterTextEntry, "An object containing 'query_selector' (DOM selector query using mmid attribute e.g. [mmid='114']) and 'text' (text to enter on the element)."]) -> Annotated[str, "Explanation of the outcome of this operation."]: """ Enters text into a DOM element identified by a CSS selector. @@ -106,11 +108,12 @@ async def entertext(entry: Annotated[EnterTextEntry, "An object containing 'quer if page is None: # type: ignore return "Error: No active page found. OpenURL command opens a new page." - function_name = inspect.currentframe().f_code.co_name + function_name = inspect.currentframe().f_code.co_name # type: ignore await browser_manager.take_screenshots(f"{function_name}_start", page) await browser_manager.highlight_element(query_selector, True) + dom_changes_detected=None def detect_dom_changes(changes:str): # type: ignore nonlocal dom_changes_detected @@ -124,13 +127,13 @@ def detect_dom_changes(changes:str): # type: ignore await browser_manager.take_screenshots(f"{function_name}_end", page) - await browser_manager.notify_user(result["summary_message"]) + await browser_manager.notify_user(result["summary_message"], message_type=MessageType.ACTION) if dom_changes_detected: - return f"{result['detailed_message']}.\n As a consequence of this action, new elements have appeared in view: {dom_changes_detected}. Get all_fields to interact with the elements." + return f"{result['detailed_message']}.\n As a consequence of this action, new elements have appeared in view: {dom_changes_detected}. This means that the action of entering text {text_to_enter} is not yet executed and needs further interaction. Get all_fields DOM to complete the interaction." return result["detailed_message"] -async def do_entertext(page: Page, selector: str, text_to_enter: str, use_keyboard_fill: bool=False): +async def do_entertext(page: Page, selector: str, text_to_enter: str, use_keyboard_fill: bool=True): """ Performs the text entry operation on a DOM element. @@ -157,6 +160,7 @@ async def do_entertext(page: Page, selector: str, text_to_enter: str, use_keyboa - If 'use_keyboard_fill' is set to False, the function uses the 'custom_fill_element' method to enter the text. """ try: + logger.debug(f"Looking for selector {selector} to enter text: {text_to_enter}") elem = await page.query_selector(selector) @@ -170,19 +174,19 @@ async def do_entertext(page: Page, selector: str, text_to_enter: str, use_keyboa if use_keyboard_fill: await elem.focus() + await asyncio.sleep(0.1) await press_key_combination("Control+A") await asyncio.sleep(0.1) await press_key_combination("Backspace") + await asyncio.sleep(0.1) logger.debug(f"Focused element with selector {selector} to enter text") - await page.keyboard.type(text_to_enter, delay=2) + #add a 100ms delay + await page.keyboard.type(text_to_enter, delay=1) else: await custom_fill_element(page, selector, text_to_enter) - logger.info(f"Success. Text \"{text_to_enter}\" set successfully in the element with selector {selector}") await elem.focus() - await page.keyboard.type("") # some html pages can have placeholders that only disappear upon keyboard input - await asyncio.sleep(1) + logger.info(f"Success. Text \"{text_to_enter}\" set successfully in the element with selector {selector}") success_msg = f"Success. Text \"{text_to_enter}\" set successfully in the element with selector {selector}" - return {"summary_message": success_msg, "detailed_message": f"{success_msg} and outer HTML: {element_outer_html}."} except Exception as e: diff --git a/ae/core/skills/get_dom_with_content_type.py b/ae/core/skills/get_dom_with_content_type.py index c60c2ba..8d0de3e 100644 --- a/ae/core/skills/get_dom_with_content_type.py +++ b/ae/core/skills/get_dom_with_content_type.py @@ -10,6 +10,7 @@ from ae.utils.dom_helper import wait_for_non_loading_dom_state from ae.utils.get_detailed_accessibility_tree import do_get_accessibility_info from ae.utils.logger import logger +from ae.utils.ui_messagetype import MessageType async def get_dom_with_content_type( @@ -73,7 +74,7 @@ async def get_dom_with_content_type( elapsed_time = time.time() - start_time logger.info(f"Get DOM Command executed in {elapsed_time} seconds") - await browser_manager.notify_user(user_success_message) + await browser_manager.notify_user(user_success_message, message_type=MessageType.ACTION) return extracted_data # type: ignore @@ -81,7 +82,7 @@ async def get_filtered_text_content(page: Page) -> str: text_content = await page.evaluate(""" () => { // Array of query selectors to filter out - const selectorsToFilter = ['#agentDriveAutoOverlay']; + const selectorsToFilter = ['#agente-overlay']; // Store the original visibility values to revert later const originalStyles = []; @@ -101,6 +102,7 @@ async def get_filtered_text_content(page: Page) -> str: // Get all the alt text from images on the page let altTexts = Array.from(document.querySelectorAll('img')).map(img => img.alt); altTexts="Other Alt Texts in the page: " + altTexts.join(' '); + // Revert the visibility changes originalStyles.forEach(entry => { entry.element.style.visibility = entry.originalStyle; @@ -109,4 +111,5 @@ async def get_filtered_text_content(page: Page) -> str: return textContent; } """) - return text_content \ No newline at end of file + return text_content + diff --git a/ae/core/skills/get_url.py b/ae/core/skills/get_url.py index f26323a..343c3da 100644 --- a/ae/core/skills/get_url.py +++ b/ae/core/skills/get_url.py @@ -1,7 +1,6 @@ from typing import Annotated from ae.core.playwright_manager import PlaywrightManager -from ae.utils.logger import logger async def geturl() -> Annotated[str, "Returns the full URL of the current active web site/page."]: @@ -14,7 +13,7 @@ async def geturl() -> Annotated[str, "Returns the full URL of the current active - Full URL the browser's active page. """ - logger.info("Executing Get URL Command") + try: # Create and use the PlaywrightManager browser_manager = PlaywrightManager(browser_type='chromium', headless=False) @@ -23,10 +22,19 @@ async def geturl() -> Annotated[str, "Returns the full URL of the current active if not page: raise ValueError('No active page found. OpenURL command opens a new page.') + await page.wait_for_load_state("domcontentloaded") + # Get the URL of the current page - url = page.url - logger.debug("Returning URL: "+url) - await browser_manager.notify_user("Grabbed the URL of the current page.") - return url + try: + title = await page.title() + current_url = page.url + if len(current_url) >250: + current_url = current_url[:250] + "..." + return f"Current Page: {current_url}, Title: {title}" # type: ignore + except: # noqa: E722 + current_url = page.url + return f"Current Page: {current_url}" + except Exception as e: raise ValueError('No active page found. OpenURL command opens a new page.') from e + diff --git a/ae/core/skills/get_user_input.py b/ae/core/skills/get_user_input.py index 9fcfb49..df72ac2 100644 --- a/ae/core/skills/get_user_input.py +++ b/ae/core/skills/get_user_input.py @@ -15,6 +15,7 @@ async def get_user_input(questions: Annotated[List[str], "List of questions to a Returns: - Newline separated list of questions to ask the user """ + answers: dict[str, str] = {} browser_manager = PlaywrightManager(browser_type='chromium', headless=False) if browser_manager.ui_manager: diff --git a/ae/core/skills/open_url.py b/ae/core/skills/open_url.py index d3b15e6..1acd5e8 100644 --- a/ae/core/skills/open_url.py +++ b/ae/core/skills/open_url.py @@ -3,8 +3,8 @@ from ae.core.playwright_manager import PlaywrightManager from ae.utils.logger import logger +from ae.utils.ui_messagetype import MessageType -#Annotated[Page, "The page instance that navigated to the specified URL."] async def openurl(url: Annotated[str, "The URL to navigate to. Value must include the protocol (http:// or https://)."], timeout: Annotated[int, "Additional wait time in seconds after initial load."] = 3) -> Annotated[str, "Returns the result of this request in text form"]: @@ -20,12 +20,11 @@ async def openurl(url: Annotated[str, "The URL to navigate to. Value must includ - URL of the new page. """ logger.info(f"Opening URL: {url}") - browser_manager = PlaywrightManager(browser_type='chromium', headless=False) await browser_manager.get_browser_context() page = await browser_manager.get_current_page() # Navigate to the URL with a short timeout to ensure the initial load starts - function_name = inspect.currentframe().f_code.co_name + function_name = inspect.currentframe().f_code.co_name # type: ignore try: await browser_manager.take_screenshots(f"{function_name}_start", page) url = ensure_protocol(url) @@ -37,9 +36,11 @@ async def openurl(url: Annotated[str, "The URL to navigate to. Value must includ await browser_manager.take_screenshots(f"{function_name}_end", page) - await browser_manager.notify_user(f"Opened URL: {url}") - return f"Page loaded: {page.url.split('?')[0]}" # type: ignore - + await browser_manager.notify_user(f"Opened URL: {url}", message_type=MessageType.ACTION) + # Get the page title + title = await page.title() + url=page.url + return f"Page loaded: {url}, Title: {title}" # type: ignore def ensure_protocol(url: str) -> str: """ diff --git a/ae/core/skills/pdf_text_extractor.py b/ae/core/skills/pdf_text_extractor.py index f3734b3..be05081 100644 --- a/ae/core/skills/pdf_text_extractor.py +++ b/ae/core/skills/pdf_text_extractor.py @@ -7,6 +7,7 @@ from ae.config import PROJECT_TEMP_PATH from ae.core.playwright_manager import PlaywrightManager from ae.utils.logger import logger +from ae.utils.ui_messagetype import MessageType async def extract_text_from_pdf(pdf_url: Annotated[str, "The URL of the PDF file to extract text from."]) -> Annotated[str, "All the text found in the PDF file."]: @@ -35,7 +36,7 @@ async def extract_text_from_pdf(pdf_url: Annotated[str, "The URL of the PDF file text += page_text + "\n" extracted_text = text.strip() word_count = len(extracted_text.split()) - await browser_manager.notify_user(f"Extracted text from the PDF successfully. Found {word_count} words.") + await browser_manager.notify_user(f"Extracted text from the PDF successfully. Found {word_count} words.", message_type=MessageType.ACTION) return "Text found in the PDF:\n" + extracted_text except httpx.HTTPStatusError as e: logger.error(f"An error occurred while downloading the PDF from {pdf_url}: {str(e)}") diff --git a/ae/core/skills/press_key_combination.py b/ae/core/skills/press_key_combination.py index aec8f67..3660dab 100644 --- a/ae/core/skills/press_key_combination.py +++ b/ae/core/skills/press_key_combination.py @@ -1,15 +1,17 @@ +import asyncio import inspect -import time from typing import Annotated -from playwright.async_api import Page +from playwright.async_api import Page # type: ignore from ae.core.playwright_manager import PlaywrightManager -from ae.core.skills.click_using_selector import do_click +from ae.utils.dom_mutation_observer import subscribe # type: ignore +from ae.utils.dom_mutation_observer import unsubscribe # type: ignore from ae.utils.logger import logger +from ae.utils.ui_messagetype import MessageType -async def press_key_combination(key_combination: Annotated[str, "The key combination to press using '+' as a separator, e.g., 'Control+C', Enter."]) -> str: +async def press_key_combination(key_combination: Annotated[str, "The key to press, e.g., Enter, PageDown etc"]) -> str: """ Presses a key combination on the current active page managed by PlaywrightManager. @@ -28,7 +30,6 @@ async def press_key_combination(key_combination: Annotated[str, "The key combina """ logger.info(f"Executing press_key_combination with key combo: {key_combination}") - start_time = time.time() # Create and use the PlaywrightManager browser_manager = PlaywrightManager() page = await browser_manager.get_current_page() @@ -39,6 +40,12 @@ async def press_key_combination(key_combination: Annotated[str, "The key combina # Split the key combination if it's a combination of keys keys = key_combination.split('+') + dom_changes_detected=None + def detect_dom_changes(changes:str): # type: ignore + nonlocal dom_changes_detected + dom_changes_detected = changes # type: ignore + + subscribe(detect_dom_changes) # If it's a combination, hold down the modifier keys for key in keys[:-1]: # All keys except the last one are considered modifier keys await page.keyboard.down(key) @@ -49,26 +56,14 @@ async def press_key_combination(key_combination: Annotated[str, "The key combina # Release the modifier keys for key in keys[:-1]: await page.keyboard.up(key) + await asyncio.sleep(0.1) # sleep for 100ms to allow the mutation observer to detect changes + unsubscribe(detect_dom_changes) - print(f"Operation completed in {time.time() - start_time} seconds.") - return f"Key combination {key_combination} executed successfully" - -async def press_enter_key(selector: Annotated[str, """The properly formed query selector string to identify the element to press enter key in. - When \"mmid\" attribute is present, use it for the query selector."""]) -> Annotated[str, "A message indicating success or failure."]: - logger.info(f"Executing press_enter_key with selector: \"{selector}\"") - browser_manager = PlaywrightManager(browser_type='chromium', headless=False) - page = await browser_manager.get_current_page() - - if page is None: # type: ignore - raise ValueError('No active page found. OpenURL command opens a new page.') + if dom_changes_detected: + return f"Key {key_combination} executed successfully.\n As a consequence of this action, new elements have appeared in view:{dom_changes_detected}. This means that the action is not yet executed and needs further interaction. Get all_fields DOM to complete the interaction." - await do_click(page, selector, wait_before_execution=0.0) - result = await do_press_key_combination(browser_manager, page, 'Enter') - - if result: - return f"Enter key pressed in field with selector: {selector}" - else: - return f"Failed to press Enter key in field with selector: {selector}" + await browser_manager.notify_user(f"Key {key_combination} executed successfully", message_type=MessageType.ACTION) + return f"Key {key_combination} executed successfully" async def do_press_key_combination(browser_manager: PlaywrightManager, page: Page, key_combination: str) -> bool: @@ -90,7 +85,7 @@ async def do_press_key_combination(browser_manager: PlaywrightManager, page: Pag logger.info(f"Executing press_key_combination with key combo: {key_combination}") try: - function_name = inspect.currentframe().f_code.co_name + function_name = inspect.currentframe().f_code.co_name # type: ignore await browser_manager.take_screenshots(f"{function_name}_start", page) # Split the key combination if it's a combination of keys keys = key_combination.split('+') @@ -113,3 +108,4 @@ async def do_press_key_combination(browser_manager: PlaywrightManager, page: Pag await browser_manager.take_screenshots(f"{function_name}_end", page) return True + diff --git a/ae/core/system_orchestrator.py b/ae/core/system_orchestrator.py index de98f27..31c4c73 100644 --- a/ae/core/system_orchestrator.py +++ b/ae/core/system_orchestrator.py @@ -7,6 +7,7 @@ from ae.config import SOURCE_LOG_FOLDER_PATH from ae.core.autogen_wrapper import AutogenWrapper from ae.utils.cli_helper import async_input # type: ignore +from ae.utils.http_helper import make_post_request from ae.utils.logger import logger @@ -24,7 +25,7 @@ class SystemOrchestrator: shutdown_event (asyncio.Event): Event to wait for an exit command to be processed. """ - def __init__(self, agent_scenario:str="user_proxy,browser_nav_agent", input_mode:str="GUI_ONLY"): + def __init__(self, agent_scenario:str="user,planner_agent,browser_nav_agent,browser_nav_executor", input_mode:str="GUI_ONLY"): """ Initializes the system orchestrator with the specified agent scenario and input mode. @@ -37,16 +38,34 @@ def __init__(self, agent_scenario:str="user_proxy,browser_nav_agent", input_mode self.browser_manager = None self.autogen_wrapper = None self.is_running = False + + if os.getenv('ORCHESTRATOR_API_KEY', None) is not None and os.getenv('ORCHESTRATOR_GATEWAY', None) is not None: + self.__populate_orchestrator_info() + logger.info(f"Orchestrator endpoint: {self.orchestrator_endpoint}") + else: + self.use_orchestrator = False + self.__parse_user_and_browser_agent_names() self.shutdown_event = asyncio.Event() #waits for an exit command to be processed + + def __populate_orchestrator_info(self): + """ + Populates the orchestrator information by retrieving the API key, gateway, and endpoint from environment variables. + """ + self.orchestrator_api_key = os.getenv('ORCHESTRATOR_API_KEY') + self.orchestrator_gateway = os.getenv('ORCHESTRATOR_GATEWAY') + self.orchestrator_endpoint = f"{self.orchestrator_gateway}/api/orchestrate" + self.use_orchestrator = True + + def __parse_user_and_browser_agent_names(self): """ Parse the user and browser agent names from agent_scenario """ self.agent_names = self.agent_scenario.split(',') for agent_name in self.agent_names: - if 'user_proxy' in agent_name: + if 'user' in agent_name: self.ser_agent_name = agent_name else: self.browser_agent_name = agent_name @@ -92,6 +111,25 @@ async def receive_command(self, command: str): """ await self.process_command(command) + async def __orchestrate_command(self, command: str): + if not self.use_orchestrator: + return command + + orch_response = make_post_request(self.orchestrator_endpoint, {"query": command}, self.orchestrator_api_key, api_key_header_name="X-API-Key") # type: ignore + + if not orch_response: + return command + + if "user_notification" in orch_response: + await self.browser_manager.notify_user(orch_response["user_notification"]) # type: ignore + if "is_terminating" in orch_response and orch_response["is_terminating"]: + logger.info("Orchestrator indicated command execution completed.") + return None + if "reformulated_query" in orch_response: + logger.info(f"Orchestrator reformulated command to: {orch_response['reformulated_query']}") + return orch_response["reformulated_query"] + + async def process_command(self, command: str): """ Processes a given command, coordinating with the Autogen wrapper for execution and handling special commands like 'exit'. @@ -99,6 +137,7 @@ async def process_command(self, command: str): Args: command (str): The command to process. """ + logger.info(f"Received command: {command}") if command.lower() == 'exit': await self.shutdown() return @@ -107,15 +146,30 @@ async def process_command(self, command: str): self.is_running = True start_time = time.time() current_url = await self.browser_manager.get_current_url() if self.browser_manager else None + self.browser_manager.ui_manager.clear_conversation_history() # type: ignore self.browser_manager.log_user_message(command) # type: ignore - + result = None + logger.info(f"Processing command: {command}") if self.autogen_wrapper: - await self.autogen_wrapper.process_command(command, current_url) + await self.browser_manager.update_processing_state("processing") # type: ignore + orchestrated_command = await self.__orchestrate_command(command) + if orchestrated_command is not None: + result = await self.autogen_wrapper.process_command(orchestrated_command, current_url) + else: + result = await self.autogen_wrapper.process_command(command, current_url) + + await self.browser_manager.update_processing_state("done") # type: ignore end_time = time.time() elapsed_time = round(end_time - start_time, 2) logger.info(f"Command \"{command}\" took: {elapsed_time} seconds.") await self.save_chat_messages() - await self.browser_manager.notify_user(f"Completed ({elapsed_time}s).") # type: ignore + if result is not None: + chat_history= result.chat_history # type: ignore + last_message = chat_history[-1] if chat_history else None # type: ignore + if last_message and "terminate" in last_message and last_message["terminate"]=="yes": + await self.browser_manager.notify_user(last_message, "answer") # type: ignore + + await self.browser_manager.notify_user(f"Task Completed ({elapsed_time}s).", "info") # type: ignore await self.browser_manager.command_completed(command, elapsed_time) # type: ignore self.is_running = False diff --git a/ae/core/ui_manager.py b/ae/core/ui_manager.py index f3da06a..b86ee9c 100644 --- a/ae/core/ui_manager.py +++ b/ae/core/ui_manager.py @@ -1,5 +1,4 @@ -import json import os import traceback @@ -9,6 +8,7 @@ from ae.config import PROJECT_SOURCE_ROOT from ae.utils.js_helper import escape_js_message from ae.utils.logger import logger +from ae.utils.ui_messagetype import MessageType class UIManager: @@ -24,6 +24,10 @@ class UIManager: """ overlay_is_collapsed: bool = True + + overlay_processing_state: str = "init" #init: initialised, processing: processing is ongoing, done: processing is done + overlay_show_details:bool = True + conversation_history:list[dict[str, str]] = [] __update_overlay_chat_history_running: bool = False @@ -51,10 +55,12 @@ async def handle_navigation(self, frame: Frame): # Inject the JavaScript code into the page await frame.evaluate(js_code) + js_bool = str(self.overlay_show_details).lower() if self.overlay_is_collapsed: - await frame.evaluate("showCollapsedOverlay();") + await frame.evaluate(f"showCollapsedOverlay('{self.overlay_processing_state}', {js_bool});") else: - await frame.evaluate("showExpandedOverlay();") + await frame.evaluate(f"showExpandedOverlay('{self.overlay_processing_state}', {js_bool});") + #update chat history in the overlay await self.update_overlay_chat_history(frame) @@ -87,6 +93,32 @@ def update_overlay_state(self, is_collapsed: bool): self.overlay_is_collapsed = is_collapsed + + async def update_overlay_show_details(self, show_details: bool, page: Page): + """ + Updates the state of the overlay to either show steps or not. + + Args: + show_steps (bool): True to show steps, False to hide them. + """ + self.overlay_show_details = show_details + await self.update_overlay_chat_history(page) + + + async def update_processing_state(self, state: str, page: Page): + """ + Updates the processing state of the overlay. + + Args: + state (str): The processing state to update. + """ + self.overlay_processing_state = state + try: + js_bool = str(self.overlay_is_collapsed).lower() + await page.evaluate(f"updateOverlayState('{self.overlay_processing_state}', {js_bool});") + except Exception as e: + logger.debug(f"JavaScript error: {e}") + async def update_overlay_chat_history(self, frame_or_page: Frame | Page): """ Updates the chat history in the overlay. If the overlay is expanded and not currently being updated, @@ -110,16 +142,32 @@ async def update_overlay_chat_history(self, frame_or_page: Frame | Page): await frame_or_page.evaluate("clearOverlayMessages();") for message in self.conversation_history: safe_message = escape_js_message(message["message"]) + safe_message_type = escape_js_message(message.get("message_type", MessageType.STEP.value)) if message["from"] == "user": await frame_or_page.evaluate(f"addUserMessage({safe_message});") else: - await frame_or_page.evaluate(f"addSystemMessage({safe_message});") + #choose chich message types to be shown depending on UI setting + if self.overlay_show_details == False: # noqa: E712 + if message["message_type"] not in (MessageType.PLAN.value, MessageType.QUESTION.value, MessageType.ANSWER.value, MessageType.INFO.value): + continue + else: + if message["message_type"] not in (MessageType.PLAN.value, MessageType.QUESTION.value , MessageType.ANSWER.value, MessageType.INFO, MessageType.STEP.value): + continue + + js_code = f"addSystemMessage({safe_message}, is_awaiting_user_response=false, message_type={safe_message_type});" + await frame_or_page.evaluate(js_code) logger.debug("Chat history updated in overlay, removing update lock flag") except Exception: traceback.print_exc() finally: self.__update_overlay_chat_history_running = False + def clear_conversation_history(self): + """ + Clears the conversation history. + """ + self.conversation_history = [] + self.add_default_system_messages() def get_conversation_history(self): """ @@ -130,6 +178,7 @@ def get_conversation_history(self): """ return self.conversation_history + def new_user_message(self, message: str): """ Adds a new user message to the conversation history. @@ -137,25 +186,26 @@ def new_user_message(self, message: str): Args: message (str): The message text to add. """ + self.conversation_history.append({"from":"user", "message":message}) - def new_system_message(self, message: str): + def new_system_message(self, message: str, type:MessageType=MessageType.STEP): """ Adds a new system message to the conversation history. Args: message (str): The message text to add. """ - self.conversation_history.append({"from":"system", "message":message}) + self.conversation_history.append({"from":"system", "message":message, "message_type":type.value}) + print(f"Adding system message: {message}") def add_default_system_messages(self): """ Adds default system messages to the conversation history to greet the user or provide initial instructions. """ - self.new_system_message(json.dumps("Agent-E at your service, what can I do for you?")) - + pass async def command_completed(self, page: Page, command: str, elapsed_time: float|None = None): """ diff --git a/ae/main.py b/ae/main.py index b0ebd40..cd1a4a3 100644 --- a/ae/main.py +++ b/ae/main.py @@ -3,5 +3,5 @@ from ae.core.system_orchestrator import SystemOrchestrator if __name__ == "__main__": - orchestrator = SystemOrchestrator(agent_scenario="user_proxy,browser_nav_agent",input_mode="GUI_ONLY") + orchestrator = SystemOrchestrator(agent_scenario="user,planner_agent,browser_nav_agent,browser_nav_executor",input_mode="GUI_ONLY") asyncio.run(orchestrator.start()) diff --git a/ae/ui/injectOverlay.js b/ae/ui/injectOverlay.js index 59ebf94..f707de3 100644 --- a/ae/ui/injectOverlay.js +++ b/ae/ui/injectOverlay.js @@ -6,20 +6,8 @@ function injectOveralyStyles() { let style = document.createElement('style'); // Set the styles style.textContent = ` - @font-face { - font-family: 'CircularXX'; - src: url('https://assets.website-files.com/627028e6193b2d840a066eab/627028e6193b2d9dd2066edf_CircularXXWeb-Book.woff2') format('woff2'); - font-weight: 400; - font-style: normal; - font-display: auto; -} -@font-face { - font-family: 'CircularXXLight'; - src: url('https://assets.website-files.com/627028e6193b2d840a066eab/627028e6193b2d710b066eda_CircularXXWeb-Light.woff2') format('woff2'); - font-weight: 300; - font-style: normal; - font-display: auto; -} +@import url(https://fonts.googleapis.com/earlyaccess/notosanssc.css); + ::-webkit-scrollbar { width: 6px; border: solid 3px transparent; @@ -38,91 +26,146 @@ function injectOveralyStyles() { background-color: rgba(255, 255, 255, 0.6); } - .disabled { - opacity: 0.95; + .agente-pre-line { + white-space: pre-line; !important; } - .pre-line { - white-space: pre-line; + #agente-closebutton{ + width:30px; + height:30px; + min-width:30px; + min-height:30px; + margin-left: auto; + color:darkgray; + cursor: pointer; + background: transparent; + transition: transform 0.2s ease; + border: None; + } + #agente-closebutton:hover{ + transform: scale(1.1); } - .enabled { - opacity: 1; + #agente-closebutton:active{ + transform: scale(0.8); } - #closebutton{ - width:25px; - height:25px; - min-width:25px; - min-height:25px; - position: absolute; - top: 10px; - right: 10px; - color:darkgray; - cursor: pointer; - border: 1px solid lightgray; - z-index: 20000001; - background: white; + @keyframes agente-gradient-animation { + 0% {background-position: 100% 0%} + 100% {background-position: 15% 100%} + } + @keyframes agente-rotate { + 100% { + transform: rotate(1turn); + } } - #closebutton:hover{ - border: 1px solid orange; - color:black; - font-weight: bold; + + @keyframes automation_highlight_fadeout_animation { + 0% { border-color: rgba(128, 0, 128, 1); } + 50% { border-color: rgba(128, 0, 128, 1); } + 100% { border-color: rgba(128, 0, 128, 0); } + } + + .agente-ui-automation-highlight { + border-width: 2px !important; + border-style: solid !important; + animation: automation_highlight_fadeout_animation 5s linear 1 forwards !important; + } + + .agente-processing{ + background: linear-gradient(90deg, + rgba(255, 0, 255, 1) 0%, /* Bright Magenta */ + rgba(0, 191, 255, 1) 100% /* Deep Sky Blue */ + ); + background-size: 100% 200%; + animation: agente-rotate 1s linear infinite; + } + + .agente-init{ + background: darkgray; + box-shadow: rgba(120, 120, 120, 0.3) 0px 0px 20px + } + + .agente-done{ + background: lightgreen; + } + + .agente-processingLine { + background: linear-gradient(45deg, + rgba(255, 0, 0, 1) 0%, /* Red */ + rgba(255, 127, 0, 1) 25%, /* Orange */ + rgba(0, 255, 0, 1) 50%, /* Green */ + rgba(0, 0, 255, 1) 75%, /* Blue */ + rgba(255, 0, 0, 1) 90%, /* Red */ + rgba(255, 0, 0, 1) 100% /* Red */ + ); + background-size: 500% 100%; + animation: agente-gradient-animation 3s linear infinite; + } + + .agente-initStateLine{ + background: lightgray; + } + + .agente-doneStateLine{ + background: lightgreen; } - .collapsed{ + .agente-collapsed{ cursor: pointer; background-color: rgba(0, 0, 0, 0.1); background-repeat: no-repeat; background-position: center; background-size: cover; - width: 5vh; - height: 5vh; + width: 6vh; + height: 6vh; border-radius: 50%; right: 1.5vw; bottom: 1.5vw; - padding: 0.5%; box-shadow: rgba(0, 0, 0, 0.3) 0px 0px 20px } - .chat-container { + .agente-chat-container { margin:1%,1%,1%,1%; - width: 25vw; - height:60vh; + width: 30vw; + min-width: 350px; + height:70vh; bottom: 2vh; position: relative; + display: flex; + flex-direction: column; top: 6%; - box-sizing: border-box; /* Include padding in the width and height calculations */ + padding: 1%; + box-sizing: border-box; } - .icon{ - width: 25px; - border-radius: 50%; - height: 25px; - } - - - - .chat-input{ + .agente-chat-input{ display: flex; flex-direction: row; - gap:2%; - justify-content: center; align-items: center; - width: 100%; - margin-top:2vh; + width: 95%; + margin-top:1.5vh; + } + + .agente-agent{ + justify-content: flex-start; + } + + .agente-user{ + justify-content: flex-end; } - #user-input { + #agente-user-input { flex: 1; padding: 3px 3px; - border: 1px solid #ccc; - border-radius: 3px; - width:80%; + border: transparent; + width:100%; resize: none; - font-family: 'CircularXX'; - font-size: 14px; + font-family: 'Noto Sans SC'; + font-size: 1.6vh; + min-font-size: 12px; + line-height: 1.5; display: flex; vertical-align: middle; text-align: middle; @@ -131,205 +174,248 @@ function injectOveralyStyles() { border-color: #ccc; background: white; color:black; - line-height: 1.2; min-height: calc(1.2em * 2); scrollbar-width: thin; } - #user-input:focus { + #agente-user-input:focus { outline: none !important; - border:1px solid orange; - box-shadow: 0 0 10px #719ECE; - } - #send-btn { - padding: 5px; - margin-left: 5px; - border: 1px solid #ccc; - border-radius: 3px; - cursor: pointer; - color:black; - opacity: 0.9; - background: white; - height:100%; - font-family: 'CircularXX'; + border:0px solid transparent !important; + box-shadow: none !important; } - #send-btn:hover{ - background: orange; - opacity: 1; + #agente-send-btn { + cursor: pointer; + transition: transform 0.2s ease; + } + #agente-send-btn:hover{ + transform: scale(1.1); } - .highlight_overlay{ + .agente-highlight_overlay{ box-shadow: 1px 1px 1px 1px rgb(50 50 50 / 40%); - border-radius: 10px; - border: 1px solid #ccc; + border-radius: 16px; + border: 1px solid #E1DEE2; bottom:3px; right:5px; - padding: 1%; - padding-top:30px; - background: rgba(255, 255, 255, 1.0); + background: #FBFAFA; } - #chat-box { + + #agente-chat-box { overflow-y: auto; scrollbar-width: thin; height: 90%; - width:100%; display: flex; flex-direction: column; gap:1%; - margin:1%; + margin:1% 5%; padding-bottom:1%; margin-top:10%; } - #agentDriveAutoOverlay { + #agente-overlay { position: fixed; - min-width: 30px; - min-height: 30px; + min-width: 50px; + min-height: 50px; margin-left: auto; margin-right: auto; z-index:20000000; scrollbar-color: gray lightgray; margin-bottom: 1%; + display: flex; + flex-direction: column; } - .agent1{ - background: blueviolet; - border-radius: 50%; - } - - .agent2{ - background: rgba(150, 255, 150, 1); - border-radius: 50%; - } - .user{ - background: orange; - border-radius: 50%; - } - - .input-container { + .agente-input-container { display: flex; - padding: 0%; - height:8%; + flex-direction: column; + margin: 1% 3%; + padding: 1%; + height:20%; + background: white; + border: 1px solid #E1DEE2; + border-radius: 8px; } - .chat{ + .agente-chat{ width: 80%; color: black; overflow-wrap: break-word; - font-family: 'CircularXX'; - font-size: 14px; + font-family: 'Noto Sans SC'; + } - .agent1text{ + .agente-systemMessage{ text-align: left; justify-content: flex-start; - margin-right: auto; - margin-left: auto; - font-family: 'CircularXX'; - padding: 5%; + font-family: 'Noto Sans SC'; + padding: 2% 4%; + font-size: 1.5vh; + min-font-size: 12px; min-height: 30px; - background: linear-gradient(180deg, rgba(0, 0, 0, 0.04) 0%, rgba(0, 0, 0, 0.12) 100%); - box-shadow: 1px 1px 1px 1px rgb(150 150 150 / 60%); - padding-left: 10px; - border-radius: 20px; - border: 1px solid blueviolet; - width:72%; - } - .agent2text{ - text-align: left; - justify-content: flex-start; - margin-right: auto; - margin-left: auto; - font-family: 'CircularXX'; - padding: 5%; - min-height: 30px; - background: linear-gradient(180deg, rgba(0, 0, 0, 0.04) 0%, rgba(0, 0, 0, 0.12) 100%); - box-shadow: 1px 1px 1px 1px rgb(150 150 150 / 60%); - padding-left: 10px; - border-radius: 20px; - border: 1px solid rgba(150, 255, 150, 1); - width:72%; + background: #EEEEEF; + line-height: 1.7; + border-radius: 10px; + width:auto; + max-width: 90%; } - .usertext{ + .agente-usertext{ text-align: right; - justify-content: flex-start; - margin-right: auto; - margin-left: auto; - font-family: 'CircularXX'; - padding: 5%; + justify-content: flex-end; + align-items: flex-end; + font-family: 'Noto Sans SC'; + font-size: 1.5vh; + min-font-size: 12px; + padding: 2% 4%; + line-height: 1.7; min-height: 30px; - width:72%; - background: linear-gradient(180deg, rgba(0, 0, 0, 0.04) 0%, rgba(0, 0, 0, 0.20) 100%) - /* White Glass Effect */ - box-shadow: 8px 8px 16px rgba(0, 0, 0, 0.12), inset 1px 1px 2px rgba(255, 255, 255, 0.64), inset -1px -1px 2px rgba(255, 255, 255, 0.4); - border-radius: 20px; + width:auto; + background: #ECEBF3; + border-radius: 10px; color: black; - border: 1px solid orange; } - - @keyframes automation_blink { - 0% { border-color: rgba(128, 0, 128, 1); } - 50% { border-color: rgba(128, 0, 128, 1); } - 100% { border-color: rgba(128, 0, 128, 0); } + .agente-agentstep{ + color: #4B4B4B; } - - .ui_automation_pulsate { - border-width: 2px !important; - border-style: solid !important; - animation: automation_blink 5s linear 1 forwards !important; + .agente-agentplan{ + color: #4B4B4B; + } + .agente-agentanswer{ + color: black; } + + + .agente-toggle { + -webkit-appearance: none; + -moz-appearance: none; + appearance: none; + margin: 0; + display: inline-block; + position: relative; + border-radius: 50px; + overflow: hidden; + outline: none; + border: none; + cursor: pointer; + background-color: #E1DEE2; + transition: background-color ease 0.3s; + align-self: center; +} +.agente-toggle:focus { + border: none; !important; + outline: none; !important; +} +.agente-toggle:before { + content: ""; + display: block; + position: absolute; + z-index: 2; + width: 20px; + height: 20px; + background: #fff; + left: 2px; + top: 2px; + border-radius: 50%; + color: #fff; + text-shadow: -1px -1px rgba(0,0,0,0.15); + white-space: nowrap; + box-shadow: 0 1px 2px rgba(0,0,0,0.2); + transition: all cubic-bezier(0.3, 1.5, 0.7, 1) 0.3s; +} + +.agente-toggle:checked { + background-color: #786E96; +} + +.agente-toggle:checked:before { + left: 20px; +} `; // Append the style element to the head of the document document.head.appendChild(style); } let savedSelection = null; +let show_details = true; -function showCollapsedOverlay() { + +function showCollapsedOverlay(processing_state = "processing", steps) { + show_details = steps; removeOverlay(); window.overlay_state_changed(true); - let newDiv = document.createElement("div"); - newDiv.id = "agentDriveAutoOverlay"; - newDiv.classList.add("collapsed"); - newDiv.setAttribute("aria-hidden", "true"); - - let svg = `<svg xmlns="http://www.w3.org/2000/svg" height="800" width="800" viewBox="0 0 64 64" xml:space="preserve"><style>.st3{fill:#fff}.st4{fill:#4f5d73}</style><g id="Layer_1"><circle cx="32" cy="32" r="32" fill="#9c27b0"/><path d="M52 32c0-9.9-9-18-20-18s-20 8.1-20 18c0 9.6 8.3 17.4 18.8 17.9.7 3.7 1.2 6.1 1.2 6.1s5-3 9.6-8.2C47.8 44.7 52 38.8 52 32z" fill="#231f20" opacity=".2"/><path class="st3" d="M49 28.8C49 43.8 32 54 32 54s-9.4-42 0-42 17 7.5 17 16.8z" fill="#000000"/><ellipse class="st3" cx="32" cy="30" rx="20" ry="18" fill="#000000"/><circle class="st4" cx="32" cy="30" r="2" fill="#000000"/><circle class="st4" cx="40" cy="30" r="2" fill="#000000"/><circle class="st4" cx="24" cy="30" r="2" fill="#000000"/></g></svg>`; - let encodedSvg = encodeURIComponent(svg); + let collapsed_agente = document.createElement("div"); + collapsed_agente.id = "agente-overlay"; + collapsed_agente.classList.add("agente-collapsed"); + collapsed_agente.style.backgroundColor = "transparent"; + collapsed_agente.setAttribute("aria-hidden", "true"); + collapsed_agente.style.justifyContent = "center"; + let wrapper = document.createElement("div"); + wrapper.style.position = "relative"; + wrapper.style.width = "100%"; + wrapper.style.height = "100%"; + wrapper.style.justifyContent = "center"; + let logodiv= document.createElement("div"); + logodiv.style.width = "90%"; + logodiv.style.height = "90%"; + logodiv.style.left = "5%"; + logodiv.style.top = "5%"; + let borderdiv = document.createElement("div"); + borderdiv.style.width = "100%"; + borderdiv.style.height = "100%"; + borderdiv.style.borderRadius = "50%"; + + let logo = `<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><rect x="6.5" y="7.5" width="11" height="11" rx="0.5" stroke="#827C8C"/><rect x="-0.5" y="0.5" width="3" height="5" rx="0.5" transform="matrix(-1 0 0 1 6 10)" stroke="#827C8C"/><rect x="-0.5" y="0.5" width="3" height="5" rx="0.5" transform="matrix(-1 0 0 1 20 10)" stroke="#827C8C"/><path d="M12 4V7.5" stroke="#827C8C" stroke-linecap="round"/><rect x="8.5" y="11.5" width="7" height="3" rx="1.5" stroke="#827C8C"/></svg>`; + let encodedSvg = encodeURIComponent(logo); let svgUrl = 'data:image/svg+xml;utf8,' + encodedSvg; - - document.body.appendChild(newDiv); - let element = document.getElementById('agentDriveAutoOverlay'); - element.style.backgroundImage = `url("${svgUrl}")`; - document.getElementById('agentDriveAutoOverlay').addEventListener('mouseover', function () { + logodiv.style.backgroundImage = `url("${svgUrl}")`; + logodiv.style.backgroundRepeat = "no-repeat"; + logodiv.style.backgroundSize = "contain"; + logodiv.style.borderRadius = "50%"; + logodiv.style.backgroundPosition = "center"; + logodiv.style.backgroundColor = "white"; + logodiv.style.alignSelf = "center"; + borderdiv.style.position = "absolute"; + borderdiv.style.top = "0"; + borderdiv.style.left = "0"; + borderdiv.id="AgentEOverlayBorder"; + logodiv.style.position = "absolute"; + logodiv.style.justifySelf = "center"; + wrapper.appendChild(borderdiv); + wrapper.appendChild(logodiv); + collapsed_agente.appendChild(wrapper); + document.body.appendChild(collapsed_agente); + + updateOverlayState(processing_state, true); + + let element = document.getElementById('agente-overlay'); + document.getElementById('agente-overlay').addEventListener('mouseover', function () { this.style.transform = 'scale(1.1)'; }); - document.getElementById('agentDriveAutoOverlay').addEventListener('mouseout', function () { + document.getElementById('agente-overlay').addEventListener('mouseout', function () { this.style.transform = 'scale(1)'; }); - document.getElementById('agentDriveAutoOverlay').addEventListener('click', function () { - showExpandedOverlay(); + document.getElementById('agente-overlay').addEventListener('click', function () { + let ui_state = document.getElementById("AgentEOverlayBorder").classList.contains("agente-init") ? "init" : document.getElementById("AgentEOverlayBorder").classList.contains("agente-processing") ? "processing" : "done"; + showExpandedOverlay(ui_state, show_details); }); } function removeOverlay() { - let element = document.getElementById("agentDriveAutoOverlay"); + let element = document.getElementById("agente-overlay"); if (element) { element.parentNode.removeChild(element); } } - -function clearOverlayMessages() { +function clearOverlayMessages(keep_default=false) { try { - let chatBox = document.getElementById('chat-box'); + let chatBox = document.getElementById('agente-chat-box'); if (!chatBox) { return; } - console.log("Clearing chat box"); while (chatBox.firstChild) { chatBox.removeChild(chatBox.firstChild); } @@ -339,78 +425,235 @@ function clearOverlayMessages() { } } -function createIcon(className) { - let icon = document.createElement("div"); - icon.className = `icon ${className}`; - return icon; +function updateOverlayState(processing_state, is_collapsed) +{ + if (is_collapsed) { + let borderdiv = document.getElementById("AgentEOverlayBorder"); + if (processing_state === "init"){ + borderdiv.classList.add("agente-init"); + borderdiv.classList.remove("agente-processing"); + borderdiv.classList.remove("agente-done"); + } + else if (processing_state === "processing"){ + borderdiv.classList.remove("agente-init"); + borderdiv.classList.add("agente-processing"); + borderdiv.classList.remove("agente-done"); + } + else if (processing_state === "done"){ + borderdiv.classList.remove("agente-init"); + borderdiv.classList.remove("agente-processing"); + borderdiv.classList.add("agente-done"); + } + } else { + let animation = document.getElementById("AgentEExpandedAnimation"); + if (processing_state === "init"){ + animation.classList.remove("agente-processingLine"); + animation.classList.add("agente-initStateLine"); + animation.classList.remove("agente-doneStateLine"); + enableOverlay(); + } + else if (processing_state === "processing"){ + animation.classList.add("agente-processingLine"); + animation.classList.remove("agente-initStateLine"); + animation.classList.remove("agente-doneStateLine"); + disableOverlay(); + } + else if (processing_state === "done"){ + animation.classList.remove("agente-processingLine"); + animation.classList.remove("agente-initStateLine"); + animation.classList.add("agente-doneStateLine"); + enableOverlay(); + } + } } -function showExpandedOverlay() { +function showExpandedOverlay(processing_state = "init", show_steps=true) { + ui_state = processing_state; + show_details = show_steps; + let agente_logo = `<svg width="85" height="12" viewBox="0 0 85 12" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M0 11.8027L3.43562 0.213699H8.35069L11.8027 11.8027H9.3863L8.23562 7.85753H3.53425L2.38356 11.8027H0ZM4.10959 5.86849H7.66027L6.18082 0.80548H5.58904L4.10959 5.86849Z" fill="#6B6673"/><path d="M19.0946 12C15.6096 12 13.7028 9.56712 13.7028 6.09863C13.7028 2.4 15.9055 0 19.4562 0C22.4151 0 24.5685 1.70959 24.9631 4.35616H22.6124C22.3822 2.87671 21.2151 1.9726 19.5713 1.9726C17.3192 1.9726 16.0535 3.58356 16.0535 6.09863C16.0535 8.35068 17.0726 10.011 19.637 10.011C21.7576 10.011 22.974 8.94247 22.974 7.15068H19.374V5.40822H23.9768C24.8151 5.40822 25.2918 5.85205 25.2918 6.69041V11.8027H23.0069V10.7671L23.4672 8.92603H22.8589C22.8754 9.6 22.4973 12 19.0946 12Z" fill="#6B6673"/><path d="M28.7192 11.8027V0.213699H37.3987V2.20274H31.0206V5.04658H36.5768V6.95342H31.0206V9.8137H37.3987V11.8027H28.7192Z" fill="#6B6673"/><path d="M40.533 11.8027V0.213699H45.0536L49.1631 11.211H49.7385L49.3275 9.76438V0.213699H51.6125V11.8027H47.0919L42.9823 0.80548H42.3905L42.8179 2.25205V11.8027H40.533Z" fill="#6B6673"/><path d="M54.4378 0.213699H64.4159V2.20274H60.5693V11.8027H58.2844V2.20274H54.4378V0.213699Z" fill="#6B6673"/><path d="M63.9401 6.6411H72.4551V8.30137H63.9401V6.6411Z" fill="#6B6673"/><path d="M75.3643 11.8027V0.213699H84.0438V2.20274H77.6657V5.04658H83.2219V6.95342H77.6657V9.8137H84.0438V11.8027H75.3643Z" fill="#6B6673"/></svg>`; + let close_icon = `<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M5 10L10 10L10 5" stroke="#827C8C"/><path d="M19 14L14 14L14 19" stroke="#827C8C"/><path d="M14 5L14 10L19 10" stroke="#827C8C"/><path d="M10 19L10 14L5 14" stroke="#827C8C"/><path d="M6.35355 5.64645C6.15829 5.45118 5.84171 5.45118 5.64645 5.64645C5.45118 5.84171 5.45118 6.15829 5.64645 6.35355L6.35355 5.64645ZM10.3536 9.64645L6.35355 5.64645L5.64645 6.35355L9.64645 10.3536L10.3536 9.64645Z" fill="#827C8C"/><path d="M17.6464 18.3536C17.8417 18.5488 18.1583 18.5488 18.3536 18.3536C18.5488 18.1583 18.5488 17.8417 18.3536 17.6464L17.6464 18.3536ZM13.6464 14.3536L17.6464 18.3536L18.3536 17.6464L14.3536 13.6464L13.6464 14.3536Z" fill="#827C8C"/><path d="M18.3536 6.35355C18.5488 6.15829 18.5488 5.84171 18.3536 5.64645C18.1583 5.45119 17.8417 5.45119 17.6464 5.64645L18.3536 6.35355ZM14.3536 10.3536L18.3536 6.35355L17.6464 5.64645L13.6464 9.64645L14.3536 10.3536Z" fill="#827C8C"/><path d="M5.64645 17.6464C5.45118 17.8417 5.45118 18.1583 5.64645 18.3536C5.84171 18.5488 6.15829 18.5488 6.35355 18.3536L5.64645 17.6464ZM9.64645 13.6464L5.64645 17.6464L6.35355 18.3536L10.3536 14.3536L9.64645 13.6464Z" fill="#827C8C"/></svg>`; + let icon = `<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg"><rect x="6.5" y="7.5" width="11" height="11" rx="0.5" stroke="#827C8C"/><rect x="-0.5" y="0.5" width="3" height="5" rx="0.5" transform="matrix(-1 0 0 1 6 10)" stroke="#827C8C"/><rect x="-0.5" y="0.5" width="3" height="5" rx="0.5" transform="matrix(-1 0 0 1 20 10)" stroke="#827C8C"/><path d="M12 4V7.5" stroke="#827C8C" stroke-linecap="round"/><rect x="8.5" y="11.5" width="7" height="3" rx="1.5" stroke="#827C8C"/></svg>`; removeOverlay(); window.overlay_state_changed(false); - console.log("showing expanded overlay"); let newDiv = document.createElement("div"); - newDiv.id = "agentDriveAutoOverlay"; - newDiv.classList.add("highlight_overlay"); - newDiv.classList.add("agentDriveAutoOverlay"); + newDiv.id = "agente-overlay"; + newDiv.classList.add("agente-highlight_overlay"); newDiv.setAttribute("aria-hidden", "true"); + newDiv.setAttribute("tabindex", "0"); + + let header = document.createElement("div"); + header.style.display = "flex"; + header.style.flexDirection = "row"; + header.style.margin = "4%"; + + let logoIcon= document.createElement("div"); + logoIcon.style.width = "25px"; + logoIcon.style.height = "25px"; + logoIcon.style.backgroundImage = `url('data:image/svg+xml;utf8,${encodeURIComponent(icon)}')`; + logoIcon.style.backgroundRepeat = "no-repeat"; + logoIcon.style.backgroundSize = "contain"; + logoIcon.style.backgroundPosition = "bottom"; + logoIcon.style.order = 1; + logoIcon.style.alignSelf = "flex-end"; + logoIcon.style.marginRight = "1%"; + + let logoDiv = document.createElement("div"); + logoDiv.style.width = "100px"; + logoDiv.style.height = "25px"; + logoDiv.style.backgroundImage = `url('data:image/svg+xml;utf8,${encodeURIComponent(agente_logo)}')`; + logoDiv.style.backgroundRepeat = "no-repeat"; + logoDiv.style.backgroundSize = "contain"; + logoDiv.style.backgroundPosition = "bottom"; + // Style the logoDiv and button + logoDiv.style.order = 1; + let closeButton = document.createElement("button"); - closeButton.id = "closebutton"; - closeButton.textContent = "X"; + closeButton.id = "agente-closebutton"; + closeButton.style.backgroundImage = `url('data:image/svg+xml;utf8,${encodeURIComponent(close_icon)}')`; + closeButton.style.backgroundRepeat = "no-repeat"; + closeButton.style.backgroundSize = "contain"; + closeButton.style.backgroundPosition = "bottom"; closeButton.onclick = function () { - showCollapsedOverlay(); + let ui_state = document.getElementById("AgentEExpandedAnimation").classList.contains("agente-initStateLine") ? "init" : document.getElementById("AgentEExpandedAnimation").classList.contains("agente-processingLine") ? "processing" : "done"; + showCollapsedOverlay(ui_state, show_details); }; - + closeButton.style.order = 3; + header.appendChild(logoIcon); + header.appendChild(logoDiv); + let animation = document.createElement("div"); + animation.id = "AgentEExpandedAnimation"; + animation.style.height = "2px"; + animation.style.width = "100%"; + + header.appendChild(closeButton); // Append the close button to the newDiv - newDiv.appendChild(closeButton); + newDiv.appendChild(header); + + newDiv.appendChild(animation); let chatContainer = document.createElement("div"); - chatContainer.className = "chat-container"; + chatContainer.className = "agente-chat-container"; let chatBox = document.createElement("div"); - chatBox.id = "chat-box"; + chatBox.id = "agente-chat-box"; let chatInput = document.createElement("div"); - chatInput.className = "chat-input"; - - let iconAgent1 = createIcon("agent1"); - + chatInput.className = "agente-chat-input"; chatBox.appendChild(chatInput); let inputContainer = document.createElement("div"); - inputContainer.className = "input-container"; - + inputContainer.className = "agente-input-container"; + inputContainer.id = "agente-input-container"; let userInput = document.createElement("textarea"); - userInput.id = "user-input"; - userInput.placeholder = "Type the task for the agent..."; + userInput.id = "agente-user-input"; + userInput.placeholder = "What can I help you solve today?"; + userInput.addEventListener('input', function(event) { + let text = event.target.value; + if (text.trim() == "") { + let button_disabled_svg =`<svg width="40" height="40" viewBox="0 0 40 40" fill="none" xmlns="http://www.w3.org/2000/svg"><rect width="40" height="40" rx="4" fill="#EEEEEF"/><path d="M15 20H25" stroke="#AEA9B4" stroke-width="1.5"/><path d="M20 15L25 20L20 25" stroke="#AEA9B4" stroke-width="1.5"/></svg>`; + let sendBtn = document.getElementById('agente-send-btn'); + sendBtn.style.backgroundImage = `url('data:image/svg+xml;utf8,${encodeURIComponent(button_disabled_svg)}')`; + } + else{ + let button_enabled_svg= `<svg width="40" height="40" viewBox="0 0 40 40" fill="none" xmlns="http://www.w3.org/2000/svg"><rect width="40" height="40" rx="4" fill="#252539"/><path d="M15 20H25" stroke="white" stroke-width="1.5"/><path d="M20 15L25 20L20 25" stroke="white" stroke-width="1.5"/></svg>`; + let sendBtn = document.getElementById('agente-send-btn'); + sendBtn.style.backgroundImage = `url('data:image/svg+xml;utf8,${encodeURIComponent(button_enabled_svg)}')`; + } + }); + let userinput_footer = document.createElement("div"); + userinput_footer.style.display = "flex"; + userinput_footer.style.flexDirection = "row"; + userinput_footer.style.justifyContent = "space-between"; + userinput_footer.style.alignItems = "center"; + userinput_footer.style.height = "40%"; + userinput_footer.style.margin = "2% 1%"; + userinput_footer.id="userinput_section" + + let toggleLabel = document.createElement("label"); // Create a new label element + toggleLabel.textContent = "Show Details"; // Set the text content of the label + toggleLabel.style.color = "#6B6673"; // Set the color of the label + toggleLabel.style.fontFamily = "Noto Sans SC"; // Set the font of the label + toggleLabel.style.fontSize = "14px"; // Set the font size of the label + toggleLabel.style.fontWeight = "400"; // Set the font weight of the label + toggleLabel.style.margin = "0px"; // Add some margin to the right of the label + toggleLabel.style.marginRight = "10px"; // Add some margin to the right of the label + + let toggleSwitch = document.createElement("input"); + + toggleSwitch.type = "checkbox"; + toggleSwitch.className = "agente-toggle"; + toggleSwitch.style.width = "44px"; + toggleSwitch.style.height = "24px"; + toggleSwitch.style.margin = "0px"; + + if (show_details){ + toggleSwitch.checked = true; + } + else{ + toggleSwitch.checked = false; + } + + toggleSwitch.addEventListener('change', function() { + if(this.checked) { + show_details = true; + window.show_steps_state_changed(true) + } else { + show_details = false; + window.show_steps_state_changed(false) + } +}); - let sendBtn = document.createElement("button"); - sendBtn.id = "send-btn"; - sendBtn.textContent = "Send"; + let sendicon =`<svg width="40" height="40" viewBox="0 0 40 40" fill="none" xmlns="http://www.w3.org/2000/svg"><rect width="40" height="40" rx="4" fill="#EEEEEF"/><path d="M15 20H25" stroke="#AEA9B4" stroke-width="1.5"/><path d="M20 15L25 20L20 25" stroke="#AEA9B4" stroke-width="1.5"/></svg>`; + let sendBtn = document.createElement("div"); + sendBtn.id = "agente-send-btn"; + sendBtn.style.backgroundImage = `url('data:image/svg+xml;utf8,${encodeURIComponent(sendicon)}')`; + sendBtn.style.backgroundRepeat = "no-repeat"; + sendBtn.style.backgroundSize = "contain"; + sendBtn.style.backgroundPosition = "right"; + sendBtn.style.width = "8%"; + sendBtn.style.height = "100%"; + sendBtn.style.marginLeft = "auto"; + + userinput_footer.appendChild(toggleLabel); // Add the label to the div + userinput_footer.appendChild(toggleSwitch); + userinput_footer.appendChild(sendBtn); inputContainer.appendChild(userInput); - inputContainer.appendChild(sendBtn); + inputContainer.appendChild(userinput_footer); chatContainer.appendChild(chatBox); chatContainer.appendChild(inputContainer); newDiv.appendChild(chatContainer); - document.body.appendChild(newDiv); + let disclaimer = document.createElement("p"); + disclaimer.style.fontFamily = "Noto Sans SC"; + disclaimer.style.fontSize = "12px"; + disclaimer.style.color = "#6B6673"; + disclaimer.style.alignSelf = "center"; + disclaimer.style.position = "absolute"; + disclaimer.style.bottom = "0%"; + disclaimer.style.margin = "0% 0% 1% 0%"; + disclaimer.textContent = "Agent-E may make mistakes. Verify key info."; + + newDiv.appendChild(disclaimer); - document.getElementById('send-btn').addEventListener('click', function () { - let task = document.getElementById('user-input').value - if (task && !isDisabled()) { + document.body.appendChild(newDiv); + updateOverlayState(processing_state, false); + document.getElementById('agente-send-btn').addEventListener('click', function () { + let task = document.getElementById('agente-user-input').value + let task_trimmed = task.trim(); + if (task_trimmed && !isDisabled() && task_trimmed.length > 0) { if (awaitingUserResponse) { addUserMessage(task); - document.getElementById('user-input').value = ""; + document.getElementById('agente-user-input').value = ""; } else { - console.log(`Sending ${task} to server`); - + clearOverlayMessages(); addUserMessage(task); + disableOverlay(); window.process_task(task) - document.getElementById('user-input').value = ""; + document.getElementById('agente-user-input').value = ""; } } else { @@ -421,9 +664,8 @@ function showExpandedOverlay() { userInput.addEventListener('focus', function() { if (window.getSelection().rangeCount > 0) { let selectedText = window.getSelection().toString(); - console.log(selectedText); if (selectedText) { - document.getElementById('user-input').value = selectedText + '\n'; + document.getElementById('agente-user-input').value = selectedText + '\n'; setTimeout(function() { userInput.selectionStart = userInput.selectionEnd = userInput.value.length; userInput.scrollTop = userInput.scrollHeight; @@ -441,12 +683,12 @@ userInput.addEventListener('blur', function() { } }); - document.getElementById('user-input').addEventListener('keydown', function (event) { + document.getElementById('agente-user-input').addEventListener('keydown', function (event) { // Check if the pressed key is the Enter key if (event.key === "Enter") { event.preventDefault(); - let targetElement = document.getElementById('send-btn'); + let targetElement = document.getElementById('agente-send-btn'); // Create a new click event let clickEvent = new MouseEvent('click', { @@ -463,46 +705,48 @@ userInput.addEventListener('blur', function() { function focusOnOverlayInput() { - document.getElementById('user-input').focus(); + document.getElementById('agente-user-input').focus(); } -function addMessage(message, sender) { - //console.log(`Adding ${sender} message: ${message}`); +function addMessage(message, sender, message_type = "plan") { let newDiv = document.createElement("div"); - newDiv.classList.add("chat-input"); - - let iconDiv1 = document.createElement("div"); - iconDiv1.classList.add("icon"); - + newDiv.classList.add("agente-chat-input"); let chatDiv = document.createElement("div"); - chatDiv.classList.add("chat"); - - let iconDiv2 = document.createElement("div"); - iconDiv2.classList.add("icon"); + chatDiv.classList.add("agente-chat"); - newDiv.appendChild(iconDiv1); - newDiv.appendChild(chatDiv); - newDiv.appendChild(iconDiv2); let parsedMessage = message; try { parsedMessage = JSON.parse(message); } catch (e) { - //console.log("Message is not in JSON format, using original message."); + console.log("Message is not in JSON format, using original message."); } // Customize based on the sender if (sender === "system") { - iconDiv1.classList.add("agent1"); - chatDiv.classList.add("agent1text", "pre-line"); - chatDiv.innerText = parsedMessage; + newDiv.classList.add("agente-agent"); + chatDiv.classList.add("agente-systemMessage", "agente-pre-line"); + if (message_type === "step") { + chatDiv.classList.add("agente-agentstep"); + } + else if (message_type === "plan" || message_type === "question") { + chatDiv.classList.add("agente-agentplan"); + } + + else if (message_type === "answer") { + chatDiv.classList.add("agente-agentanswer"); + } + if ((message_type === "info" && message.includes("Task Completed")) || message_type==="question") { + enableOverlay(); + } + chatDiv.textContent = parsedMessage; } else if (sender === "user") { - iconDiv2.classList.add("user"); - chatDiv.classList.add("usertext", "pre-line"); - chatDiv.innerText = parsedMessage; + newDiv.classList.add("agente-user") + chatDiv.classList.add("agente-usertext", "agente-pre-line"); + chatDiv.textContent = parsedMessage; } - - let chatBox = document.getElementById('chat-box'); + newDiv.appendChild(chatDiv); + let chatBox = document.getElementById('agente-chat-box'); chatBox.appendChild(newDiv); chatBox.scrollTop = chatBox.scrollHeight; newDiv.scrollIntoView({ behavior: 'instant' }); @@ -512,35 +756,42 @@ function addMessage(message, sender) { // Notify the server that the user has responded to the agent's prompt window.user_response(message); } -} +} -function addSystemMessage(message, is_awaiting_user_response = false) { - awaitingUserResponse = is_awaiting_user_response; - addMessage(message, "system"); +function addSystemMessage(message, is_awaiting_user_response = false, message_type = "plan") { + // Function to actually add the message + function executeAddMessage() { + awaitingUserResponse = is_awaiting_user_response; + addMessage(message, "system", message_type); + } + requestAnimationFrame(executeAddMessage); } function addUserMessage(message) { addMessage(message, "user"); } - function disableOverlay() { - let element = document.getElementById("agentDriveAutoOverlay"); - element.classList.remove("enabled"); - element.classList.add("disabled"); + let input_field= document.getElementById("agente-user-input"); + if(input_field){ + input_field.placeholder = "Processing..."; + } } function isDisabled() { - let element = document.getElementById("agentDriveAutoOverlay"); - return element.classList.contains("disabled"); + let input_field= document.getElementById("agente-user-input"); + if(input_field){ + return input_field.placeholder === "Processing..."; + } } + function enableOverlay() { - let element = document.getElementById("agentDriveAutoOverlay"); - element.classList.add("enabled"); - element.classList.remove("disabled"); - document.getElementById('user-input').focus(); + let input_field= document.getElementById("agente-user-input"); + if(input_field){ + input_field.placeholder = "What can I help you solve today?"; + } } function commandExecutionCompleted() { diff --git a/ae/user_preferences/user_preferences.txt b/ae/user_preferences/user_preferences.txt index 3065eee..0bac996 100644 --- a/ae/user_preferences/user_preferences.txt +++ b/ae/user_preferences/user_preferences.txt @@ -8,4 +8,5 @@ Email: myemail@gmail.com Phone Number: 123-456-7890 Here are some of my preferences: Shopping Preferences: www.amazon.com -Favorite news source: www.bbc.com \ No newline at end of file +Favorite news source: www.bbc.com +Favorite flight booking site to use with every flight related query: https://www.google.com/travel/flights \ No newline at end of file diff --git a/ae/utils/anthropic_llm_helper.py b/ae/utils/anthropic_llm_helper.py index a3480fb..6fbc870 100644 --- a/ae/utils/anthropic_llm_helper.py +++ b/ae/utils/anthropic_llm_helper.py @@ -1,9 +1,9 @@ -import asyncio -from anthropic import AsyncAnthropic +import os + import anthropic +from anthropic import AsyncAnthropic from dotenv import load_dotenv -import os -from ae.core.prompts import LLM_PROMPTS + class AnthropicLLMHelper: def __init__(self): @@ -14,7 +14,7 @@ async def get_chat_completion_response_async(self, system_msg:str, user_msgs:lis formatted_user_msgs: list[dict[str, str]] = [] for user_msg in user_msgs: formatted_user_msgs.append({"type": "text", "text": user_msg}) - + try: message = await self.client.messages.create( model=model_name, @@ -24,8 +24,8 @@ async def get_chat_completion_response_async(self, system_msg:str, user_msgs:lis messages=[ { "role": "user", - "content": formatted_user_msgs - + "content": formatted_user_msgs # type: ignore + } ] ) @@ -34,18 +34,19 @@ async def get_chat_completion_response_async(self, system_msg:str, user_msgs:lis except anthropic.APIConnectionError as e: print("The server could not be reached") print(e.__cause__) # an underlying Exception, likely raised within httpx. - raise Exception(f"Calling {model_name} LLM failed. The server could not be reached. Error: {e}") + raise Exception(f"Calling {model_name} LLM failed. The server could not be reached. Error: {e}") # noqa: B904 except anthropic.RateLimitError as e: print("A 429 status code was received; we should back off a bit.") - raise Exception(f"Calling {model_name} LLM failed. Rate limit error. Error: {e}") + raise Exception(f"Calling {model_name} LLM failed. Rate limit error. Error: {e}") # noqa: B904 except anthropic.APIStatusError as e: print(e.status_code) print(e.response) - raise Exception(f"Calling {model_name} LLM failed. Error: {e}") + raise Exception(f"Calling {model_name} LLM failed. Error: {e}") # noqa: B904 # async def main(): +# from ae.core.prompts import LLM_PROMPTS # helper = AnthropicLLMHelper() # response = await helper.get_chat_completion_response_async(LLM_PROMPTS["SKILLS_HARVESTING_PROMPT"], ["What is the weather like today?"], temperature=0, max_tokens=4000) # print("*******\nResponse: ", response, "\n*******\n") -# asyncio.run(main()) \ No newline at end of file +# asyncio.run(main()) diff --git a/ae/utils/autogen_sequential_function_call.py b/ae/utils/autogen_sequential_function_call.py new file mode 100644 index 0000000..1ef580e --- /dev/null +++ b/ae/utils/autogen_sequential_function_call.py @@ -0,0 +1,84 @@ + +import asyncio +import inspect +from typing import Any + +from autogen import Agent # type: ignore +from autogen import UserProxyAgent # type: ignore + + +class UserProxyAgent_SequentialFunctionExecution(UserProxyAgent): + def __init__(self, *args, **kwargs): # type: ignore + super().__init__(*args, **kwargs) # type: ignore + self.register_reply(Agent, UserProxyAgent_SequentialFunctionExecution.sequential_generate_tool_calls_reply) # type: ignore + + + def sequential_generate_tool_calls_reply( # type: ignore + self, + messages: list[dict] | None = None, # type: ignore + sender: Agent | None = None, + config: Any | None = None, + ) -> tuple[bool, dict[str, Any] | None]: + """Generate a reply using tool call.""" + if config is None: + config = self + if messages is None: + messages = self._oai_messages[sender] # type: ignore + message = messages[-1] # type: ignore + tool_returns = [] + skip_flag:bool = False + for tool_call in message.get("tool_calls", []): # type: ignore + function_call = tool_call.get("function", {}) # type: ignore + func = self._function_map.get(function_call.get("name", None), None) # type: ignore + func_return = None + if inspect.iscoroutinefunction(func): # type: ignore + try: + # get the running loop if it was already created + loop = asyncio.get_running_loop() + close_loop = False + except RuntimeError: + # create a loop if there is no running loop + loop = asyncio.new_event_loop() + close_loop = True + if (not skip_flag): + _, func_return = loop.run_until_complete(self.a_execute_function(function_call)) # type: ignore + if close_loop: + loop.close() + else: + if (not skip_flag): + _, func_return = self.execute_function(function_call) # type: ignore + if func_return is None: # type: ignore + if skip_flag: + content = "VERY IMPORTANT: This function could not be executed since previous function resulted in a Webpage change. You must get all_fields DOM and repeat the function if needed." + else: + content = "" + else: + content = func_return.get("content", "") # type: ignore + + if content is None: + content = "" + + if ("as a consequence of this action" in content.lower()): # type: ignore + skip_flag = True + + tool_call_id = tool_call.get("id", None) # type: ignore + if tool_call_id is not None: + tool_call_response = { # type: ignore + "tool_call_id": tool_call_id, + "role": "tool", + "content": content, + } + else: + tool_call_response = { # type: ignore + "role": "tool", + "content": content, + } + tool_returns.append(tool_call_response) # type: ignore + + if tool_returns: + return True, { + "role": "tool", + "tool_responses": tool_returns, + "content": "\n\n".join([self._str_for_tool_response(tool_return) for tool_return in tool_returns]), # type: ignore + } + return False, None diff --git a/ae/utils/dom_helper.py b/ae/utils/dom_helper.py index d7d09a1..11ab38b 100644 --- a/ae/utils/dom_helper.py +++ b/ae/utils/dom_helper.py @@ -1,6 +1,7 @@ import asyncio -from playwright.async_api import ElementHandle, Page +from playwright.async_api import ElementHandle +from playwright.async_api import Page from ae.utils.logger import logger @@ -31,7 +32,7 @@ async def get_element_outer_html(element: ElementHandle, page: Page, element_tag """ tag_name: str = element_tag_name if element_tag_name else await page.evaluate("element => element.tagName.toLowerCase()", element) - attributes_of_interest: list[str] = ['id', 'name', 'aria-label', 'placeholder', 'href', 'src', 'aria-autocomplete', 'role', 'type', + attributes_of_interest: list[str] = ['id', 'name', 'aria-label', 'placeholder', 'href', 'src', 'aria-autocomplete', 'role', 'type', 'data-testid', 'value', 'selected', 'aria-labelledby', 'aria-describedby', 'aria-haspopup'] opening_tag: str = f'<{tag_name}' diff --git a/ae/utils/dom_mutation_observer.py b/ae/utils/dom_mutation_observer.py index 0748887..95a6f5e 100644 --- a/ae/utils/dom_mutation_observer.py +++ b/ae/utils/dom_mutation_observer.py @@ -1,57 +1,68 @@ +import asyncio import json +from typing import Callable # noqa: UP035 + from playwright.async_api import Page -from typing import List, Callable -from playwright.async_api import Page -import asyncio # Create an event loop loop = asyncio.get_event_loop() -DOM_change_callback: List[Callable[[str], None]] = [] +DOM_change_callback: list[Callable[[str], None]] = [] def subscribe(callback: Callable[[str], None]) -> None: - DOM_change_callback.append(callback) + DOM_change_callback.append(callback) def unsubscribe(callback: Callable[[str], None]) -> None: DOM_change_callback.remove(callback) async def add_mutation_observer(page:Page): - """ - Adds a mutation observer to the page to detect changes in the DOM. + """ + Adds a mutation observer to the page to detect changes in the DOM. When changes are detected, the observer calls the dom_mutation_change_detected function in the browser context. This changes can be detected by subscribing to the dom_mutation_change_detected function by individual skills. - Current implementation only detects when a new node is added to the DOM. + Current implementation only detects when a new node is added to the DOM. However, in many cases, the change could be a change in the style or class of an existing node (e.g. toggle visibility of a hidden node). """ - await page.evaluate(""" - console.log('Adding a mutation observer for DOM changes'); - new MutationObserver((mutationsList, observer) => { - let changes_detected = []; - for(let mutation of mutationsList) { - if (mutation.type === 'childList') { - let allAddedNodes=mutation.addedNodes; - for(let node of allAddedNodes) { - if(node.tagName && !['SCRIPT', 'NOSCRIPT', 'STYLE'].includes(node.tagName) && !node.closest('#agentDriveAutoOverlay')) { - let visibility=node.offsetWidth > 0 && node.offsetHeight > 0; - let content = node.innerText.trim(); - if(visibility && node.innerText.trim() && window.getComputedStyle(node).display !== 'none'){ + await page.evaluate(""" + console.log('Adding a mutation observer for DOM changes'); + new MutationObserver((mutationsList, observer) => { + let changes_detected = []; + for(let mutation of mutationsList) { + if (mutation.type === 'childList') { + let allAddedNodes=mutation.addedNodes; + for(let node of allAddedNodes) { + if(node.tagName && !['SCRIPT', 'NOSCRIPT', 'STYLE'].includes(node.tagName) && !node.closest('#agentDriveAutoOverlay')) { + let visibility=true; + let content = node.innerText.trim(); + if(visibility && node.innerText.trim()){ + if(content) { + changes_detected.push({tag: node.tagName, content: content}); + } + } + } + } + } else if (mutation.type === 'characterData') { + let node = mutation.target; + if(node.parentNode && !['SCRIPT', 'NOSCRIPT', 'STYLE'].includes(node.parentNode.tagName) && !node.parentNode.closest('#agentDriveAutoOverlay')) { + let visibility=true; + let content = node.data.trim(); + if(visibility && content && window.getComputedStyle(node.parentNode).display !== 'none'){ if(content && !changes_detected.some(change => change.content.includes(content))) { - changes_detected.push({tag: node.tagName, content: content}); + changes_detected.push({tag: node.parentNode.tagName, content: content}); } - } + } } } } - } - if(changes_detected.length > 0) { - window.dom_mutation_change_detected(JSON.stringify(changes_detected)); - } - }).observe(document, {subtree: true, childList: true}); - """) + if(changes_detected.length > 0) { + window.dom_mutation_change_detected(JSON.stringify(changes_detected)); + } + }).observe(document, {subtree: true, childList: true, characterData: true}); + """) async def handle_navigation_for_mutation_observer(page:Page): @@ -61,7 +72,7 @@ async def dom_mutation_change_detected(changes_detected: str): """ Detects changes in the DOM (new nodes added) and emits the event to all subscribed callbacks. The changes_detected is a string in JSON formatt containing the tag and content of the new nodes added to the DOM. - + e.g. The following will be detected when autocomplete recommendations show up when one types Nelson Mandela on google search [{'tag': 'SPAN', 'content': 'nelson mandela wikipedia'}, {'tag': 'SPAN', 'content': 'nelson mandela movies'}] """ @@ -74,4 +85,4 @@ async def dom_mutation_change_detected(changes_detected: str): await callback(changes_detected) # If the callback is a regular function else: - callback(changes_detected) \ No newline at end of file + callback(changes_detected) diff --git a/ae/utils/gemini_llm_helper.py b/ae/utils/gemini_llm_helper.py index 0cc2dec..d6d4518 100644 --- a/ae/utils/gemini_llm_helper.py +++ b/ae/utils/gemini_llm_helper.py @@ -1,13 +1,11 @@ -import asyncio -from typing import Any -import google.generativeai as genai # type: ignore -from dotenv import load_dotenv import os import re -import json -from ae.utils.logger import logger -from ae.core.prompts import LLM_PROMPTS +from typing import Any +import google.generativeai as genai # type: ignore +from dotenv import load_dotenv + +from ae.utils.logger import logger GCP_BLOCK_NONE_SAFETY_SETTINGS: list[dict[str, str]] = [ { @@ -35,8 +33,7 @@ class GeminiLLMHelper: def __init__(self): load_dotenv() - genai.configure(api_key=os.environ.get("GEMINI_API_KEY")) - + genai.configure(api_key=os.environ.get("GEMINI_API_KEY")) # type: ignore def process_llm_response(self, response: str): if response: @@ -44,16 +41,14 @@ def process_llm_response(self, response: str): response = llm_json_or_python_begin_response_pattern.sub("", response) response = llm_end_response_pattern.sub("", response) return response - - async def get_chat_completion_response_async(self, system_msg:str, user_msgs:list[str], model_name:str="gemini-1.5-pro-latest", temperature:float=0.1, + async def get_chat_completion_response_async(self, system_msg:str, user_msgs:list[str], model_name:str="gemini-1.5-pro-latest", temperature:float=0.1, max_tokens:int=256, top_p:int=1, top_k: int=1, safety_settings:list[dict[str, str]]=GCP_BLOCK_NONE_SAFETY_SETTINGS) -> str|None: formatted_msgs: list[dict[str, Any]] = [{"role": "user", "parts": [system_msg]}] user_msgs_parts: list[str] = [] for user_msg in user_msgs: user_msgs_parts.append(user_msg) - - + formatted_msgs.append({"role": "user", "parts": user_msgs_parts}) response = None try: @@ -74,6 +69,7 @@ async def get_chat_completion_response_async(self, system_msg:str, user_msgs:lis return None # async def main(): +# from ae.core.prompts import LLM_PROMPTS # helper = GeminiLLMHelper() # response = await helper.get_chat_completion_response_async(LLM_PROMPTS["SKILLS_HARVESTING_PROMPT"], ["What is the weather like today?", "And How are you?"], temperature=0, max_tokens=4000) # print("*******\nResponse: ", response, "\n*******\n") diff --git a/ae/utils/get_detailed_accessibility_tree.py b/ae/utils/get_detailed_accessibility_tree.py index b218d31..f40e6f3 100644 --- a/ae/utils/get_detailed_accessibility_tree.py +++ b/ae/utils/get_detailed_accessibility_tree.py @@ -99,6 +99,9 @@ async def process_node(node: dict[str, Any]): if node['role'] == 'menuitem': return node.get('name') + if node.get('role') == 'dialog' and node.get('modal') == True: # noqa: E712 + node["important information"] = "This is a modal dialog. Please interact with this dialog and close it to be able to interact with the full page (e.g. by pressing the close button or selecting an option)." + if mmid: # Determine if we need to fetch 'innerText' based on the absence of 'children' in the accessibility node should_fetch_inner_text = 'children' not in node @@ -122,7 +125,6 @@ async def process_node(node: dict[str, Any]): console.log(`Ignoring element with id: ${element.id}`, element); return null; } - //Ignore "option" because it would have been processed with the select element if (tags_to_ignore.includes(element.tagName.toLowerCase()) || element.tagName.toLowerCase() === "option") return null; @@ -133,7 +135,7 @@ async def process_node(node: dict[str, Any]): // If the element is an input, include its type as well if (element.tagName.toLowerCase() === 'input') { attributes_to_values['tag_type'] = element.type; // This will capture 'checkbox', 'radio', etc. - } + } else if (element.tagName.toLowerCase() === 'select') { attributes_to_values["mmid"] = element.getAttribute('mmid'); attributes_to_values["role"] = "combobox"; @@ -150,7 +152,6 @@ async def process_node(node: dict[str, Any]): } return attributes_to_values; } - for (const attribute of attributes) { let value = element.getAttribute(attribute); @@ -169,6 +170,26 @@ async def process_node(node: dict[str, Any]): attributes_to_values['description'] = element.innerText; } + let role = element.getAttribute('role'); + if(role==='listbox' || element.tagName.toLowerCase()=== 'ul'){ + let children=element.children; + let filtered_children = Array.from(children).filter(child => child.getAttribute('role') === 'option'); + console.log("Listbox or ul found: ", filtered_children); + let attributes_to_include = ['mmid', 'role', 'aria-label','value']; + attributes_to_values["additional_info"]=[] + for (const child of children) { + let children_attributes_to_values = {}; + + for (let attr of child.attributes) { + // If the attribute is not in the predefined list, add it to children_attributes_to_values + if (attributes_to_include.includes(attr.name)) { + children_attributes_to_values[attr.name] = attr.value; + } + } + + attributes_to_values["additional_info"].push(children_attributes_to_values); + } + } // Check if attributes_to_values contains more than just 'name', 'role', and 'mmid' const keys = Object.keys(attributes_to_values); const minimalKeys = ['tag', 'mmid']; @@ -194,10 +215,10 @@ async def process_node(node: dict[str, Any]): // Check if the button has no text and no attributes if (element.innerText.trim() === '') { - + for (const child of children) { let children_attributes_to_values = {}; - + for (let attr of child.attributes) { // If the attribute is not in the predefined list, add it to children_attributes_to_values if (!attributes_to_exclude.includes(attr.name)) { @@ -228,7 +249,7 @@ async def process_node(node: dict[str, Any]): if 'keyshortcuts' in node: del node['keyshortcuts'] #remove keyshortcuts since it is not needed - + node["mmid"]=mmid # Update the node with fetched information @@ -241,7 +262,7 @@ async def process_node(node: dict[str, Any]): if 'name' in node and 'description' in node and (node['name'] == node['description'] or node['name'] == node['description'].replace('\n', ' ') or node['description'].replace('\n', '') in node['name']): del node['description'] #if the name is same as description, then remove the description to avoid duplication - + if 'name' in node and 'aria-label' in node and node['aria-label'] in node['name']: del node['aria-label'] #if the name is same as the aria-label, then remove the aria-label to avoid duplication @@ -252,7 +273,7 @@ async def process_node(node: dict[str, Any]): node.pop("children", None) node.pop("role", None) node.pop("description", None) - + #role and tag can have the same info. Get rid of role if it is the same as tag if node.get('role') == node.get('tag'): del node['role'] @@ -289,7 +310,7 @@ async def process_node(node: dict[str, Any]): } """ #textbox just means a text input and that is expressed well enough with the rest of the attributes returned - del node['role'] + #del node['role'] #remove attributes that are not needed once processing of a node is complete for attribute_to_delete in attributes_to_delete: @@ -411,11 +432,21 @@ def __should_prune_node(node: dict[str, Any], only_input_fields: bool): if node.get('role') == 'generic' and 'children' not in node and not ('name' in node and node.get('name')): # The presence of 'children' is checked after potentially deleting it above return True - + if node.get('role') in ['separator', 'LineBreak']: return True + processed_name = "" + if 'name' in node: + processed_name:str =node.get('name') # type: ignore + processed_name = processed_name.replace(',', '') + processed_name = processed_name.replace(':', '') + processed_name = processed_name.replace('\n', '') + processed_name = processed_name.strip() + if len(processed_name) <3: + processed_name = "" + #check if the node only have name and role, then delete that node - if len(node) == 2 and 'name' in node and 'role' in node: + if len(node) == 2 and 'name' in node and 'role' in node and not (node.get('role') == "text" and processed_name != ""): return True return False diff --git a/ae/utils/http_helper.py b/ae/utils/http_helper.py new file mode 100644 index 0000000..3520b68 --- /dev/null +++ b/ae/utils/http_helper.py @@ -0,0 +1,43 @@ +from typing import Any + +import requests + + +def make_post_request(url: str, data: dict[str, Any], api_key: str, api_key_header_name: str = "apikey") -> dict[str, Any]|None: + """ + Makes a POST request to the specified URL with a JSON body and an API key header. + + Args: + url (str): The URL to send the POST request to. + data (Dict[str, Any]): The JSON data to include in the POST request body. + api_key (str): The API key to include in the request headers. + api_key_header_name (str): The name of the header to include the API key in. Defaults to "apikey". + + Returns: + Optional[Dict[str, Any]]: The JSON response from the server if the request was successful and the response is in JSON format. + None: If the request failed or the response is not in JSON format. + + Raises: + requests.exceptions.RequestException: If an error occurs during the HTTP request. + """ + # Define the headers for the request + headers = { + 'Content-Type': 'application/json', + api_key_header_name: api_key + } + + try: + # Make the POST request with the given URL, data, and headers + response = requests.post(url, json=data, headers=headers) + + # Check if the request was successful + response.raise_for_status() + + # Attempt to return the JSON response + return response.json() + except requests.exceptions.RequestException as e: + print(f"Error: {e}") + return None + except ValueError: + print("Error: Response is not in JSON format") + return None diff --git a/ae/utils/js_helper.py b/ae/utils/js_helper.py index c473042..3ddc183 100644 --- a/ae/utils/js_helper.py +++ b/ae/utils/js_helper.py @@ -1,4 +1,7 @@ import json +import re + +from ae.utils.logger import logger def escape_js_message(message: str) -> str: @@ -12,3 +15,20 @@ def escape_js_message(message: str) -> str: str: The escaped message. """ return json.dumps(message) + + +def beautify_plan_message(message:str) -> str: + """ + Add a newline between each numbered step in the plan message if it does not already exist. + + Args: + message (str): The plan message. + + Returns: + str: The plan message with newlines added between each numbered step. + """ + logger.debug(f"beautify_plan_message original:\n{message}") + # Add a newline before each numbered step that is not already preceded by a newline + plan_with_newlines = re.sub(r'(?<!\n)( \d+\.)', r'\n\1', message) + logger.debug(f"beautify_plan_message modified:\n{plan_with_newlines}") + return plan_with_newlines diff --git a/ae/utils/logger.py b/ae/utils/logger.py index 5e4d4e0..4674662 100644 --- a/ae/utils/logger.py +++ b/ae/utils/logger.py @@ -1,10 +1,16 @@ import logging logger = logging.getLogger(__name__) -logging.basicConfig( - level=logging.INFO, # change level here or use set_log_level() to change it - format="[%(asctime)s] %(levelname)s {%(filename)s:%(lineno)d} - %(message)s", -) +'''logging.basicConfig( + level=logging.DEBUG, # change level here or use set_log_level() to change it + format="[%(asctime)s] %(levelname)s {%(filename)s:%(lineno)d} - %(message)s", filename='app.log', filemode='a' +)''' +logging.basicConfig(level=logging.INFO) +logging.getLogger("httpcore").setLevel(logging.WARNING) +logging.getLogger("httpx").setLevel(logging.WARNING) +logging.getLogger("matplotlib.pyplot").setLevel(logging.WARNING) +logging.getLogger("PIL.PngImagePlugin").setLevel(logging.WARNING) +logging.getLogger("PIL.Image").setLevel(logging.WARNING) def set_log_level(level: str | int) -> None: """ diff --git a/ae/utils/response_parser.py b/ae/utils/response_parser.py new file mode 100644 index 0000000..bae1e70 --- /dev/null +++ b/ae/utils/response_parser.py @@ -0,0 +1,60 @@ +import json +from typing import Any + +from ae.utils.logger import logger + + +def parse_response(message: str) -> dict[str, Any]: + """ + Parse the response from the browser agent and return the response as a dictionary. + """ + # Parse the response content + json_response = {} + #if message starts with ``` and ends with ``` then remove them + if message.startswith("```"): + message = message[3:] + if message.endswith("```"): + message = message[:-3] + if message.startswith("json"): + message = message[4:] + + message = message.strip() + try: + json_response: dict[str, Any] = json.loads(message) + except Exception as e: + # If the response is not a valid JSON, try pass it using string matching. + #This should seldom be triggered + logger.warn(f"LLM response was not properly formed JSON. Will try to use it as is. LLM response: \"{message}\". Error: {e}") + message = message.replace("\\n", "\n") + message = message.replace("\n", "") # type: ignore + if ("plan" in message and "next_step" in message): + start = message.index("plan") + len("plan") + end = message.index("next_step") + json_response["plan"] = message[start:end].replace('"', '').strip() + if ("next_step" in message and "terminate" in message): + start = message.index("next_step") + len("next_step") + end = message.index("terminate") + json_response["next_step"] = message[start:end].replace('"', '').strip() + if ("terminate" in message and "final_response" in message): + start = message.index("terminate") + len("terminate") + end = message.index("final_response") + matched_string=message[start:end].replace('"', '').strip() + if ("yes" in matched_string): + json_response["terminate"] = "yes" + else: + json_response["terminate"] = "no" + + start=message.index("final_response") + len("final_response") + end=len(message)-1 + json_response["final_response"] = message[start:end].replace('"', '').strip() + + elif ("terminate" in message): + start = message.index("terminate") + len("terminate") + end = len(message)-1 + matched_string=message[start:end].replace('"', '').strip() + if ("yes" in matched_string): + json_response["terminate"] = "yes" + else: + json_response["terminate"] = "no" + + return json_response diff --git a/ae/utils/ui_messagetype.py b/ae/utils/ui_messagetype.py new file mode 100644 index 0000000..f42d586 --- /dev/null +++ b/ae/utils/ui_messagetype.py @@ -0,0 +1,11 @@ +from enum import Enum + + +# class syntax +class MessageType(Enum): + PLAN = "plan" + STEP = "step" + ACTION ="action" + ANSWER = "answer" + QUESTION= "question" + INFO = "info" diff --git a/pyproject.toml b/pyproject.toml index cf15e49..d6259ce 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -22,7 +22,8 @@ dependencies = [ "pyautogen==0.2.27", "pydantic==2.6.2", "python-dotenv==1.0.0", - "tabulate==0.9.0" + "tabulate==0.9.0", + "nest-asyncio==1.6.0" ] [project.optional-dependencies] diff --git a/requirements.txt b/requirements.txt index d0a6fdf..b513ce0 100644 --- a/requirements.txt +++ b/requirements.txt @@ -91,6 +91,7 @@ idna==3.6 # requests joblib==1.3.2 # via nltk +nest-asyncio==1.6.0 nltk==3.8.1 numpy==1.26.4 # via diff --git a/scripts/aggregate_test_results.py b/scripts/aggregate_test_results.py new file mode 100644 index 0000000..ee576d4 --- /dev/null +++ b/scripts/aggregate_test_results.py @@ -0,0 +1,236 @@ +import argparse +import json +import os +from collections import Counter +from collections import defaultdict +from typing import Any +from typing import List + +import pandas as pd +from pandas.io.formats.style import Styler + +URL_ALIAS_MAP = { + "https://www.allrecipes.com/": "Allrecipes", + "https://www.amazon.com/": "Amazon", + "https://www.apple.com/": "Apple", + "https://arxiv.org/": "Arxiv", + "https://www.bbc.com/news/": "BBC", + "https://www.booking.com/": "Booking", + "https://dictionary.cambridge.org/": "Dictionary", + "https://www.coursera.org/": "Coursera", + "https://www.espn.com/": "ESPN", + "https://github.com/": "GitHub", + "https://www.google.com/travel/flights/": "Flights", + "https://www.google.com/maps/": "Maps", + "https://www.google.com/": "Google", + "https://huggingface.co/": "Hugging Face", + "https://www.wolframalpha.com/": "Wolfram" +} + +def find_and_read_json_files(test_results_dir: str, target_directory_name: str) -> list[dict[str, Any]]: + result_data: list[dict[str, Any]] = [] + + # Walk through the test results directory + for root, _dirs, files in os.walk(test_results_dir): + # Check if the target directory is in the current path + if target_directory_name in root: + # If found, iterate through the files in that directory + for file in files: + if file.endswith('.json'): + file_path = os.path.join(root, file) + # Read the JSON file and append its contents to the result_data list + with open(file_path, 'r') as json_file: + print(f"Reading file: {file_path}") + try: + data = json.load(json_file) + result_data.append(data) + except json.JSONDecodeError as e: + print(f"Error decoding JSON from file {file_path}: {e}") + + return result_data + +def save_to_json_file(data: Any, output_file: str): + with open(output_file, 'w') as json_output_file: + json.dump(data, json_output_file, indent=4) + +def extract_alias(url: str) -> str: + for known_url, alias in URL_ALIAS_MAP.items(): + if url.startswith(known_url): + return alias + return "Unknown" + +def count_scores_by_alias(data: list[dict[str, Any]]): + alias_score_counter = defaultdict(Counter) + overall_score_counter = Counter() + for entry in data: + score = entry.get('score') + start_url = entry.get('start_url') + if score is not None: + overall_score_counter[score] += 1 + if start_url: + alias = extract_alias(start_url) + alias_score_counter[alias][score] += 1 + return alias_score_counter, overall_score_counter + +def calculate_percentages(score_counter: Counter) -> dict[str, float]: + total_count = sum(score_counter.values()) + score_percentages = {score: (count / total_count) * 100 for score, count in score_counter.items()} + return score_percentages, total_count + +def adjust_scores(data: list[dict[str, Any]], task_ids_to_flip: List[int]): + for entry in data: + if entry.get('task_id') in task_ids_to_flip: + if entry.get('score') == 1.0: + entry['score'] = 0.0 + return data + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="Process some JSON files.") + parser.add_argument( + "test_results_dir", + type=str, + help="The base directory containing the test results." + ) + parser.add_argument( + "--target_directory_name", + type=str, + default="results_for_test_results_for_webvoyager_test", + help="The name of the target directory to search within the base directory." + ) + parser.add_argument( + "--output_file", + type=str, + default="compiled_test_results.json", + help="The name of the output file." + ) + parser.add_argument( + "--adjust_task_ids", + type=str, + help="Comma-separated list of task_id values to flip from score 1.0 to 0.0." + ) + + args = parser.parse_args() + + # Derive the full path for the output file + output_file_path = os.path.join(args.test_results_dir, args.output_file) + + # Find and read the JSON files + compiled_data = find_and_read_json_files(args.test_results_dir, args.target_directory_name) + # Sort the compiled data by 'task_index' + sorted_data: list[dict[str, Any]] = sorted(compiled_data, key=lambda x: x.get('task_index', -1)) + + print(f"Number of records found: {len(sorted_data)}") + # Save the compiled data to a JSON file + save_to_json_file(sorted_data, output_file_path) + + # Count the scores by alias and overall + alias_score_counts, overall_score_counts = count_scores_by_alias(sorted_data) + + # Calculate percentages by alias and overall + alias_score_percentages = { + alias: calculate_percentages(score_counter) + for alias, score_counter in alias_score_counts.items() + } + overall_score_percentages, overall_total = calculate_percentages(overall_score_counts) + + # Save the alias score percentages to a JSON file + output_results = { + "overall": { + "percentages": overall_score_percentages, + "counts": dict(overall_score_counts), + "total": overall_total + }, + "by_alias": { + alias: { + "percentages": percentages, + "counts": dict(alias_score_counts[alias]), + "total": total + } + for alias, (percentages, total) in alias_score_percentages.items() + } + } + alias_output_file_path = os.path.join(args.test_results_dir, "alias_score_percentages.json") + save_to_json_file(output_results, alias_output_file_path) + + # Print the overall results to the command line + print("\nOverall Score Percentages and Counts (Pre-adjustment):") + print(f"{'Score':<10}{'Percentage':<15}{'Count':<10}") + for score, percentage in overall_score_percentages.items(): + count = overall_score_counts[score] + print(f"{score:<10}{percentage:.2f}%{count:<10}") + + # Adjust scores based on provided task IDs + if args.adjust_task_ids: + task_ids_to_flip = list(map(int, args.adjust_task_ids.split(','))) + sorted_data = adjust_scores(sorted_data, task_ids_to_flip) + + # Recount the scores by alias and overall after adjustment + alias_score_counts_adjusted, overall_score_counts_adjusted = count_scores_by_alias(sorted_data) + + # Recalculate percentages by alias and overall after adjustment + alias_score_percentages_adjusted = { + alias: calculate_percentages(score_counter) + for alias, score_counter in alias_score_counts_adjusted.items() + } + overall_score_percentages_adjusted, overall_total_adjusted = calculate_percentages(overall_score_counts_adjusted) + + # Save the adjusted alias score percentages to a JSON file + adjusted_output_results = { + "overall": { + "percentages": overall_score_percentages_adjusted, + "counts": dict(overall_score_counts_adjusted), + "total": overall_total_adjusted + }, + "by_alias": { + alias: { + "percentages": percentages, + "counts": dict(alias_score_counts_adjusted[alias]), + "total": total + } + for alias, (percentages, total) in alias_score_percentages_adjusted.items() + } + } + adjusted_alias_output_file_path = os.path.join(args.test_results_dir, "adjusted_alias_score_percentages.json") + save_to_json_file(adjusted_output_results, adjusted_alias_output_file_path) + + # Print the overall results to the command line post adjustment + print("\nOverall Score Percentages and Counts (Post-adjustment):") + print(f"{'Score':<10}{'Percentage':<15}{'Count':<10}") + for score, percentage in overall_score_percentages_adjusted.items(): + count = overall_score_counts_adjusted[score] + print(f"{score:<10}{percentage:.2f}%{count:<10}") + + # Prepare data for DataFrame post adjustment + data = [] + for score in sorted(set(overall_score_counts_adjusted.keys()).union(*[alias_score_counts_adjusted[alias].keys() for alias in alias_score_counts_adjusted])): + row = {"Score": score} + row["Overall"] = f"{overall_score_percentages_adjusted.get(score, 0):.2f}% ({overall_score_counts_adjusted.get(score, 0)})" + for alias in sorted(URL_ALIAS_MAP.values()): + percentages, _ = alias_score_percentages_adjusted.get(alias, ({}, 0)) + counts = alias_score_counts_adjusted.get(alias, {}) + row[alias] = f"{percentages.get(score, 0):.2f}% ({counts.get(score, 0)})" + data.append(row) + + # Create DataFrame + df = pd.DataFrame(data) + + # Styling the DataFrame + styled_df = df.style.set_table_styles( + [ + {'selector': 'thead th', 'props': 'font-weight: bold; text-align: center;'}, + {'selector': 'th', 'props': 'text-align: center;'}, + {'selector': 'td', 'props': 'text-align: center;'}, + {'selector': 'table', 'props': 'border-collapse: collapse; width: 100%;'}, + {'selector': 'table, th, td', 'props': 'border: 1px solid black;'} + ] + ).set_caption("Benchmark Report") + + # Save to HTML with styled format + html_output_file = os.path.join(args.test_results_dir, "benchmark_report.html") + styled_df.to_html(html_output_file) + + print(f"\nBenchmark report has been saved to: {html_output_file}") + + +# Sample how to run: +# python scripts/aggregate_test_results.py /path/to/folder/agent_e_annotators_tests/round2 --adjust_task_ids "14, 26, 51, 63, 93, 141" \ No newline at end of file diff --git a/test/tasks/annotator_dry_run_webvoyager_tasks_30.json b/test/tasks/annotator_dry_run_webvoyager_tasks_30.json new file mode 100644 index 0000000..e4a5b17 --- /dev/null +++ b/test/tasks/annotator_dry_run_webvoyager_tasks_30.json @@ -0,0 +1,812 @@ +[ + { + "sites": null, + "task_id": 15, + "require_login": false, + "storage_state": null, + "start_url": "https://www.allrecipes.com/", + "geolocation": null, + "intent_template": "Choose a dessert recipe on Allrecipes with a prep time of less than 30 minutes, has chocolate as an ingredient, and has a user rating of 4 stars or higher. Provide the name of the recipe, ingredients list, and step-by-step instructions.", + "instantiation_dict": {}, + "intent": "Choose a dessert recipe on Allrecipes with a prep time of less than 30 minutes, has chocolate as an ingredient, and has a user rating of 4 stars or higher. Provide the name of the recipe, ingredients list, and step-by-step instructions.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "'Ultimate Chocolate Dessert', 4.7-star, prep time 15 mins", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Allrecipes--15", + "task_index": 0 + }, + { + "sites": null, + "task_id": 29, + "require_login": false, + "storage_state": null, + "start_url": "https://www.allrecipes.com/", + "geolocation": null, + "intent_template": "Search for a Mediterranean-style grilled fish recipe on Allrecipes that includes ingredients like olives, has at least a 4-star rating, and more than 25 reviews. Detail the ingredients, cooking method, and total time required for preparation and cooking.", + "instantiation_dict": {}, + "intent": "Search for a Mediterranean-style grilled fish recipe on Allrecipes that includes ingredients like olives, has at least a 4-star rating, and more than 25 reviews. Detail the ingredients, cooking method, and total time required for preparation and cooking.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "'Branzino Mediterranean', 36 reviews, <Ingredients> include olive oil, <cooking method>, Prep Time: 15 mins, Cook Time: 25 mins, Total Time: 40 mins", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Allrecipes--29", + "task_index": 1 + }, + { + "sites": null, + "task_id": 72, + "require_login": false, + "storage_state": null, + "start_url": "https://www.amazon.com/", + "geolocation": null, + "intent_template": "Look for a USB-C hub on Amazon compatible with MacBook Pro, featuring at least 4 ports, including HDMI and SD card reader. The price should be under $50. Select the one after sorting by Best Sellers.", + "instantiation_dict": {}, + "intent": "Look for a USB-C hub on Amazon compatible with MacBook Pro, featuring at least 4 ports, including HDMI and SD card reader. The price should be under $50. Select the one after sorting by Best Sellers.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Hiearcool USB C Hub, USB C Multi-Port Adapter for MacBook Pro, 7IN1, include 4K HDMI USB3.0 and SD/TF Card Reader, $24.99", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Amazon--27", + "task_index": 2 + }, + { + "sites": null, + "task_id": 85, + "require_login": false, + "storage_state": null, + "start_url": "https://www.amazon.com/", + "geolocation": null, + "intent_template": "Locate a women's yoga mat in purple, with a thickness of at least 5mm, rated 4+ stars, and priced under $30 on Amazon. Check how many colors are available in total, and what is the return and delivery policy.", + "instantiation_dict": {}, + "intent": "Locate a women's yoga mat in purple, with a thickness of at least 5mm, rated 4+ stars, and priced under $30 on Amazon. Check how many colors are available in total, and what is the return and delivery policy.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "ProsourceFit Extra Thick Yoga Pilates Exercise Mat, 1/2\", 4.6 stars, $21.99, 7 colors, FREE delivery Friday, March 1 on orders shipped by Amazon over $35", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Amazon--40", + "task_index": 3 + }, + { + "sites": null, + "task_id": 97, + "require_login": false, + "storage_state": null, + "start_url": "https://www.apple.com/", + "geolocation": null, + "intent_template": "Get information about the latest iPad model released by Apple, including its release date, base storage capacity, and starting price available on Apple's official website.", + "instantiation_dict": {}, + "intent": "Get information about the latest iPad model released by Apple, including its release date, base storage capacity, and starting price available on Apple's official website.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "sixth-generation iPad Pro 11\u2011inch, iPad Pro 12.9\u2011inch; release date: October 26, 2022; base storage capacity 128 GB, starting price $799", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Apple--11", + "task_index": 4 + }, + { + "sites": null, + "task_id": 100, + "require_login": false, + "storage_state": null, + "start_url": "https://www.apple.com/", + "geolocation": null, + "intent_template": "Identify the upgrade options available for the cheapest base model of the MacBook Pro 14-inch with M3 chip, and calculate the total price difference from the base model to the maximum upgrade (no Pre-Installed Software) offered by Apple.", + "instantiation_dict": {}, + "intent": "Identify the upgrade options available for the cheapest base model of the MacBook Pro 14-inch with M3 chip, and calculate the total price difference from the base model to the maximum upgrade (no Pre-Installed Software) offered by Apple.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Base model:$1599, difference: $1020", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Apple--14", + "task_index": 5 + }, + { + "sites": null, + "task_id": 168, + "require_login": false, + "storage_state": null, + "start_url": "https://arxiv.org/", + "geolocation": null, + "intent_template": "Search the title 'GPT-4 Technical Report' and access this paper through HTML format. Read the paper on this page and tell me what is 'one of the main goals of developing such models' mentioned in the Introduction.", + "instantiation_dict": {}, + "intent": "Search the title 'GPT-4 Technical Report' and access this paper through HTML format. Read the paper on this page and tell me what is 'one of the main goals of developing such models' mentioned in the Introduction.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "One of the main goals of developing such models is to improve their ability to understand and generate natural language text, particularly in more complex and nuanced scenarios.", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "ArXiv--39", + "task_index": 6 + }, + { + "sites": null, + "task_id": 169, + "require_login": false, + "storage_state": null, + "start_url": "https://arxiv.org/", + "geolocation": null, + "intent_template": "How many articles are there on each of the three most recent announce days in the Solar and Stellar Astrophysics section of ArXiv. Choose one at random and answer its title and when the first version was uploaded?", + "instantiation_dict": {}, + "intent": "How many articles are there on each of the three most recent announce days in the Solar and Stellar Astrophysics section of ArXiv. Choose one at random and answer its title and when the first version was uploaded?", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "astro-ph.SR paper, latest 3 days", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "ArXiv--40", + "task_index": 7 + }, + { + "sites": null, + "task_id": 186, + "require_login": false, + "storage_state": null, + "start_url": "https://www.bbc.com/news/", + "geolocation": null, + "intent_template": "Find a news article on BBC News about the impact of the recent tech industry layoffs on the global economy. Summarize the key points and the name of the author, and provide the date of publication.", + "instantiation_dict": {}, + "intent": "Find a news article on BBC News about the impact of the recent tech industry layoffs on the global economy. Summarize the key points and the name of the author, and provide the date of publication.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "<title>, <author>, <summary> (impact of the recent tech industry layoffs on the global economy)", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "BBC News--14", + "task_index": 8 + }, + { + "sites": null, + "task_id": 213, + "require_login": false, + "storage_state": null, + "start_url": "https://www.bbc.com/news/", + "geolocation": null, + "intent_template": "Find Golf in BBC News, check the Leaderboard at this point in Women's Majors and count which country has the most players in the top 20? Which player has the best score amongst the Australian players and in what place.", + "instantiation_dict": {}, + "intent": "Find Golf in BBC News, check the Leaderboard at this point in Women's Majors and count which country has the most players in the top 20? Which player has the best score amongst the Australian players and in what place.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Sport - Golf - Leaderboard - Women's Majors, most in top20: American, best in Australian: Grace Kim in 36", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "BBC News--41", + "task_index": 9 + }, + { + "sites": null, + "task_id": 230, + "require_login": false, + "storage_state": null, + "start_url": "https://www.booking.com/", + "geolocation": null, + "intent_template": "Look for a hotel in Paris with a user rating of 9 or higher and available for a 5-night stay starting January 15, 2024. The hotel should also offer free Wi-Fi and breakfast included in the price. Provide the name, location, and price per night.", + "instantiation_dict": {}, + "intent": "Look for a hotel in Paris with a user rating of 9 or higher and available for a 5-night stay starting January 15, 2024. The hotel should also offer free Wi-Fi and breakfast included in the price. Provide the name, location, and price per night.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Zoku Paris; 48 Avenue de la Porte de Clichy, 17th arr., Paris; US$210 per night", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Booking--16", + "task_index": 10 + }, + { + "sites": null, + "task_id": 232, + "require_login": false, + "storage_state": null, + "start_url": "https://www.booking.com/", + "geolocation": null, + "intent_template": "Search a hotel in London with a user rating of 8 or higher for a stay between February 14th, 2024, and February 21st, 2024, suitable for a couple. Provide the name and a short description of the hotel.", + "instantiation_dict": {}, + "intent": "Search a hotel in London with a user rating of 8 or higher for a stay between February 14th, 2024, and February 21st, 2024, suitable for a couple. Provide the name and a short description of the hotel.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Cromwell Serviced Apartments; Cromwell Serviced Apartments is an apartment featuring rooms with free Wifi and air conditioning in the center of London", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Booking--18", + "task_index": 11 + }, + { + "sites": null, + "task_id": 262, + "require_login": false, + "storage_state": null, + "start_url": "https://dictionary.cambridge.org/", + "geolocation": null, + "intent_template": "Look for the British English pronunciation of the word \"innovate\" and write down the International Phonetic Alphabet (IPA) notation, then find one example sentence provided in the Cambridge Dictionary that uses this word.", + "instantiation_dict": {}, + "intent": "Look for the British English pronunciation of the word \"innovate\" and write down the International Phonetic Alphabet (IPA) notation, then find one example sentence provided in the Cambridge Dictionary that uses this word.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "UK: /\u02c8\u026an.\u0259.ve\u026at/; Above all, this proposal aims to correct the allocative inefficiencies of the existing patent system, while preserving the dynamic incentives to innovate.", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Cambridge Dictionary--4", + "task_index": 12 + }, + { + "sites": null, + "task_id": 281, + "require_login": false, + "storage_state": null, + "start_url": "https://dictionary.cambridge.org/", + "geolocation": null, + "intent_template": "Find the US English pronunciation of the word \"meticulous\" using the Cambridge Dictionary and note the International Phonetic Alphabet (IPA) notation, then find one example sentence provided in the dictionary using this word.", + "instantiation_dict": {}, + "intent": "Find the US English pronunciation of the word \"meticulous\" using the Cambridge Dictionary and note the International Phonetic Alphabet (IPA) notation, then find one example sentence provided in the dictionary using this word.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "US: /m\u0259\u02c8t\u026ak.j\u0259.l\u0259s/; Many hours of meticulous preparation have gone into writing the book.", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Cambridge Dictionary--23", + "task_index": 13 + }, + { + "sites": null, + "task_id": 325, + "require_login": false, + "storage_state": null, + "start_url": "https://www.coursera.org/", + "geolocation": null, + "intent_template": "Locate an online course on Coursera related to 'Sustainability' that belongs to Physical Science and Engineering subject. The course should include a module on Measuring Sustainability. Note the course duration and the offering institution.", + "instantiation_dict": {}, + "intent": "Locate an online course on Coursera related to 'Sustainability' that belongs to Physical Science and Engineering subject. The course should include a module on Measuring Sustainability. Note the course duration and the offering institution.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Introduction to Sustainability; University of Illinois at Urbana-Champaign; Instructors: Dr. Jonathan Tomkin; duration: Approx. 25 hours to complete, 3 weeks at 8 hours a week", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Coursera--24", + "task_index": 14 + }, + { + "sites": null, + "task_id": 327, + "require_login": false, + "storage_state": null, + "start_url": "https://www.coursera.org/", + "geolocation": null, + "intent_template": "Identify a Specialization on Coursera that offers an overview of 'Renewable Energy'. The Specialization should be beginner-level and include a course on Renewable Energy Futures. Note the instructor's name and the number of weeks required to complete the course if I spend 5 hours a week.", + "instantiation_dict": {}, + "intent": "Identify a Specialization on Coursera that offers an overview of 'Renewable Energy'. The Specialization should be beginner-level and include a course on Renewable Energy Futures. Note the instructor's name and the number of weeks required to complete the course if I spend 5 hours a week.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Renewable Energy Specialization; Instructors: Stephen R. Lawrence, Paul Komor; 2 months", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Coursera--26", + "task_index": 15 + }, + { + "sites": null, + "task_id": 373, + "require_login": false, + "storage_state": null, + "start_url": "https://www.espn.com/", + "geolocation": null, + "intent_template": "Check out the NHL Standings 2023-24 on ESPN to see which teams are at the top and which are at the bottom in Eastern and Western Conference. What about the situation in Division.", + "instantiation_dict": {}, + "intent": "Check out the NHL Standings 2023-24 on ESPN to see which teams are at the top and which are at the bottom in Eastern and Western Conference. What about the situation in Division.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "NHL Standings 2023-24, top - bottom, Eastern Conference: New York Rangers - Columbus Blue Jackets; Western Conference: Vancouver Canucks - Chicago Blackhawks; Division: ATLANTIC, Boston Bruins - Montreal Canadiens; METROPOLITAN: New York Rangers - Columbus Blue Jackets; CENTRAL: Dallas Stars - Chicago Blackhawks; PACIFIC: Vancouver Canucks - San Jose Sharks", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "ESPN--30", + "task_index": 16 + }, + { + "sites": null, + "task_id": 381, + "require_login": false, + "storage_state": null, + "start_url": "https://www.espn.com/", + "geolocation": null, + "intent_template": "Check Los Angeles Lakers Stats 2023-24, calculate Anthony Davis' games played (GP) percentage, tell me if there are other players with the same games played percentage as Anthony Davis.", + "instantiation_dict": {}, + "intent": "Check Los Angeles Lakers Stats 2023-24, calculate Anthony Davis' games played (GP) percentage, tell me if there are other players with the same games played percentage as Anthony Davis.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "54/58 = 93.1%, no other players, https://www.espn.com/nba/team/stats/_/name/lal/los-angeles-lakers", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "ESPN--38", + "task_index": 17 + }, + { + "sites": null, + "task_id": 398, + "require_login": false, + "storage_state": null, + "start_url": "https://github.com/", + "geolocation": null, + "intent_template": "Find a newly created open-source project on GitHub related to 'climate change' that has been initiated in January 2023; check the main programming language used and the project's description.", + "instantiation_dict": {}, + "intent": "Find a newly created open-source project on GitHub related to 'climate change' that has been initiated in January 2023; check the main programming language used and the project's description.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "TheAIDojo/AI-for-Climate-Change; Jupyter Notebook; Repository of notebooks and associated code that covers the fundamental concepts of deep learning and its application to climate science.", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "GitHub--11", + "task_index": 18 + }, + { + "sites": null, + "task_id": 402, + "require_login": false, + "storage_state": null, + "start_url": "https://github.com/", + "geolocation": null, + "intent_template": "Locate a repository on GitHub related to 'quantum computing' that has been updated within the last week and has at least 50 stars. Provide a brief description of the project.", + "instantiation_dict": {}, + "intent": "Locate a repository on GitHub related to 'quantum computing' that has been updated within the last week and has at least 50 stars. Provide a brief description of the project.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "desireevl/awesome-quantum-computing", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "GitHub--15", + "task_index": 19 + }, + { + "sites": null, + "task_id": 462, + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/travel/flights/", + "geolocation": null, + "intent_template": "Compare business class flight options from Lisbon to Singapore for a one-way trip on March 15, 2024, select one of the flights and see which websites offer its booking options. Which one is the cheapest.", + "instantiation_dict": {}, + "intent": "Compare business class flight options from Lisbon to Singapore for a one-way trip on March 15, 2024, select one of the flights and see which websites offer its booking options. Which one is the cheapest.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Emirates, 8:45\u202fPM \u2013 9:15\u202fPM(+1), booking options: Emirates, Gotogate, Martigo, Expedia, kiss&fly, eDreams ... cheapest: Gotogate", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Google Flights--34", + "task_index": 20 + }, + { + "sites": null, + "task_id": 465, + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/travel/flights/", + "geolocation": null, + "intent_template": "Locate a round-trip flight from Buenos Aires to Beijing, leaving on February 28, 2024, and returning on March 3, 2024, check out one of the options and tell me if the airline for my return flight is the same as my departure flight.", + "instantiation_dict": {}, + "intent": "Locate a round-trip flight from Buenos Aires to Beijing, leaving on February 28, 2024, and returning on March 3, 2024, check out one of the options and tell me if the airline for my return flight is the same as my departure flight.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Lufthansa, 5:50\u202fPM \u2013 9:30\u202fAM(+2), return flight can be Lufthansa, 11:20\u202fAM \u2013 7:55\u202fAM(+1), the same as departure flight", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Google Flights--37", + "task_index": 21 + }, + { + "sites": null, + "task_id": 489, + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/maps/", + "geolocation": null, + "intent_template": "I will arrive Pittsburgh Airport soon. Provide the name of the Hilton hotel closest to the airport. Then, tell me the the walking time to the nearest supermarket from the hotel.", + "instantiation_dict": {}, + "intent": "I will arrive Pittsburgh Airport soon. Provide the name of the Hilton hotel closest to the airport. Then, tell me the the walking time to the nearest supermarket from the hotel.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Hilton Garden Inn Pittsburgh Airport, walking time around 15min - 30min", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Google Map--19", + "task_index": 22 + }, + { + "sites": null, + "task_id": 503, + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/maps/", + "geolocation": null, + "intent_template": "Check out Denver International Airport's information and tell me: 1) which level has the least proportion in reviews; 2) what are its Accessibility and Amenities.", + "instantiation_dict": {}, + "intent": "Check out Denver International Airport's information and tell me: 1) which level has the least proportion in reviews; 2) what are its Accessibility and Amenities.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "star 2 has the least proportion; Accessibility: Assistive hearing loop; Wheelchair accessible entrance; Wheelchair accessible parking lot; Wheelchair accessible restroom; Wheelchair accessible seating; Amenities: Baggage storage; Wi-Fi; Free Wi-Fi", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Google Map--33", + "task_index": 23 + }, + { + "sites": null, + "task_id": 519, + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/", + "geolocation": null, + "intent_template": "Find the video on YouTube: 'Oscars 2023: Must-See Moments!'. Tell me who the first comment displayed under that video belongs to, and how many thumbs up and replies it has.", + "instantiation_dict": {}, + "intent": "Find the video on YouTube: 'Oscars 2023: Must-See Moments!'. Tell me who the first comment displayed under that video belongs to, and how many thumbs up and replies it has.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "user: @melvinsmiley5295, 329 thumbs up and 2 replies (real-time)", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Google Search--8", + "task_index": 24 + }, + { + "sites": null, + "task_id": 544, + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/", + "geolocation": null, + "intent_template": "Find and copy the SHA of the latest commit in the TensorFlow repository on GitHub, then find a textbox to paste and tell me what the SHA is.", + "instantiation_dict": {}, + "intent": "Find and copy the SHA of the latest commit in the TensorFlow repository on GitHub, then find a textbox to paste and tell me what the SHA is.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "<SHA> of latest Tensorflow", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Google Search--33", + "task_index": 25 + }, + { + "sites": null, + "task_id": 560, + "require_login": false, + "storage_state": null, + "start_url": "https://huggingface.co/", + "geolocation": null, + "intent_template": "Find the model sentence-transformers/all-MiniLM-L6-v2 and use the Inference API on the webpage to get the similarity of the following two sentences: 'Tomorrow is Sunday', 'Eat a burger on Sunday'.", + "instantiation_dict": {}, + "intent": "Find the model sentence-transformers/all-MiniLM-L6-v2 and use the Inference API on the webpage to get the similarity of the following two sentences: 'Tomorrow is Sunday', 'Eat a burger on Sunday'.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "0.550", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Huggingface--6", + "task_index": 26 + }, + { + "sites": null, + "task_id": 571, + "require_login": false, + "storage_state": null, + "start_url": "https://huggingface.co/", + "geolocation": null, + "intent_template": "Find the most recently updated open-source project related to natural language processing on the Huggingface platform. Provide the project's name, creator, and a brief description of its functionality.", + "instantiation_dict": {}, + "intent": "Find the most recently updated open-source project related to natural language processing on the Huggingface platform. Provide the project's name, creator, and a brief description of its functionality.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "<model>; <creator>; <description> (recent, NLP)", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Huggingface--17", + "task_index": 27 + }, + { + "sites": null, + "task_id": 604, + "require_login": false, + "storage_state": null, + "start_url": "https://www.wolframalpha.com/", + "geolocation": null, + "intent_template": "Give the final angle and final length after 6s of a Spring pendulum with spring equilibrium length=0.12m, initial length=0.24m, initial angle=80deg, mass=1kg, spring constant=120 N/m .", + "instantiation_dict": {}, + "intent": "Give the final angle and final length after 6s of a Spring pendulum with spring equilibrium length=0.12m, initial length=0.24m, initial angle=80deg, mass=1kg, spring constant=120 N/m .", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "-73.26\u00b0 from vertical; 0.252 m", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Wolfram Alpha--7", + "task_index": 28 + }, + { + "sites": null, + "task_id": 640, + "require_login": false, + "storage_state": null, + "start_url": "https://www.wolframalpha.com/", + "geolocation": null, + "intent_template": "A polyominoes of order 6 means you have 6 identical squares to combine different shapes (2-sided). How many combinations are there? Looking at all the shapes in the result, how many of them have only 2 rows in total?", + "instantiation_dict": {}, + "intent": "A polyominoes of order 6 means you have 6 identical squares to combine different shapes (2-sided). How many combinations are there? Looking at all the shapes in the result, how many of them have only 2 rows in total?", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "35; 12", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + }, + "task_alias": "Wolfram Alpha--43", + "task_index": 29 + } +] \ No newline at end of file diff --git a/test/tasks/webvoyager_sampled_data.json b/test/tasks/webvoyager_sampled_data.json new file mode 100644 index 0000000..ed960e4 --- /dev/null +++ b/test/tasks/webvoyager_sampled_data.json @@ -0,0 +1,1627 @@ +[ + { + "sites": null, + "task_id": "Allrecipes--0", + "require_login": false, + "storage_state": null, + "start_url": "https://www.allrecipes.com/", + "geolocation": null, + "intent_template": "Provide a recipe for vegetarian lasagna with more than 100 reviews and a rating of at least 4.5 stars suitable for 6 people.", + "instantiation_dict": {}, + "intent": "Provide a recipe for vegetarian lasagna with more than 100 reviews and a rating of at least 4.5 stars suitable for 6 people.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "'Vegetarian Four Cheese Lasagna', 4.6-star, 181 reviews, Servings 8", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Allrecipes--10", + "require_login": false, + "storage_state": null, + "start_url": "https://www.allrecipes.com/", + "geolocation": null, + "intent_template": "Find The Most Popular Recipes of the 1960s, noting the recipe name, preparation time and total time of the second recipe in this collection.", + "instantiation_dict": {}, + "intent": "Find The Most Popular Recipes of the 1960s, noting the recipe name, preparation time and total time of the second recipe in this collection.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "'Swedish Meatballs I', prep time 25 mins, total time 1 hour 25 mins", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Allrecipes--20", + "require_login": false, + "storage_state": null, + "start_url": "https://www.allrecipes.com/", + "geolocation": null, + "intent_template": "Find a recipe for a cauliflower pizza crust that has a preparation time of under 30 minutes and a rating of at least 4 stars on Allrecipes. Include the number of calories per serving.", + "instantiation_dict": {}, + "intent": "Find a recipe for a cauliflower pizza crust that has a preparation time of under 30 minutes and a rating of at least 4 stars on Allrecipes. Include the number of calories per serving.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "'Cauliflower Pizza Crust', 4.2 stars, Prep Time: 15 mins, 59 Calories per serving", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Allrecipes--30", + "require_login": false, + "storage_state": null, + "start_url": "https://www.allrecipes.com/", + "geolocation": null, + "intent_template": "Find a recipe for a vegan smoothie bowl on Allrecipes that includes bananas and leaves, has more than 20 reviews, and a rating of at least 4 stars. Provide a list of ingredients, preparation time, and a summary of the recipe steps.", + "instantiation_dict": {}, + "intent": "Find a recipe for a vegan smoothie bowl on Allrecipes that includes bananas and leaves, has more than 20 reviews, and a rating of at least 4 stars. Provide a list of ingredients, preparation time, and a summary of the recipe steps.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "'Spinach and Banana Power Smoothie', 4.8 stars, 72 reviews, Ingredients: 1 cup plain soy milk, 3/4 cup packed fresh spinach leaves, 1 large banana, sliced; Prep Time: 10 mins; <steps>", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Allrecipes--40", + "require_login": false, + "storage_state": null, + "start_url": "https://www.allrecipes.com/", + "geolocation": null, + "intent_template": "Browse the about us section of Allrecipes for a brief introduction to The Allrecipes Allstars.", + "instantiation_dict": {}, + "intent": "Browse the about us section of Allrecipes for a brief introduction to The Allrecipes Allstars.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "The Allrecipes Allstars: Social media influencers, registered dietitians, grillmasters, and more seasoned home cooks make up our enthusiastic squad of 100+ brand ambassadors. This diverse, food-loving crew spans the U.S. geographically and represents many different cultures, ethnicities, and family makeups. Since 2011, the Allstars have created tens of thousands of original recipes, photos, and reviews plus shared their cooking expertise via flat and video content on our website, social media, plus more marketing channels.", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Amazon--5", + "require_login": false, + "storage_state": null, + "start_url": "https://www.amazon.com/", + "geolocation": null, + "intent_template": "Find a Blue iPhone 12 Pro 128gb and add to cart.", + "instantiation_dict": {}, + "intent": "Find a Blue iPhone 12 Pro 128gb and add to cart.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Apple iPhone 12 Pro, 128GB, Pacific Blue - Fully Unlocked (Renewed); Action: ADD_TO_CHART", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Amazon--15", + "require_login": false, + "storage_state": null, + "start_url": "https://www.amazon.com/", + "geolocation": null, + "intent_template": "Find a pair of mens running shoes in black, size 7, 4+ stars and under $50 and add them to my cart on Amazon.", + "instantiation_dict": {}, + "intent": "Find a pair of mens running shoes in black, size 7, 4+ stars and under $50 and add them to my cart on Amazon.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Damyuan Men's Sport Gym Running Shoes Walking Shoes Casual Lace Up Lightweight; black, size 7, 4.0-star, $29.99", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Amazon--25", + "require_login": false, + "storage_state": null, + "start_url": "https://www.amazon.com/", + "geolocation": null, + "intent_template": "Search for a queen-sized, hypoallergenic mattress topper on Amazon. It should have a memory foam material and be priced between $50 to $100.", + "instantiation_dict": {}, + "intent": "Search for a queen-sized, hypoallergenic mattress topper on Amazon. It should have a memory foam material and be priced between $50 to $100.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "2 Inch 7-Zone Memory Foam Mattress Topper Queen with 100% Bamboo Rayon Cover, Cooling Gel-Infused Swirl Egg Crate Memory Foam, $99.99", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Amazon--35", + "require_login": false, + "storage_state": null, + "start_url": "https://www.amazon.com/", + "geolocation": null, + "intent_template": "Find a men's leather wallet on Amazon with RFID blocking, at least 6 card slots, and priced below $50. Check if it's available for FREE delivery.", + "instantiation_dict": {}, + "intent": "Find a men's leather wallet on Amazon with RFID blocking, at least 6 card slots, and priced below $50. Check if it's available for FREE delivery.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "STAY FINE Top Grain Leather Wallet for Men, RFID Blocking, Slim Billfold with 8 Card Slots, FREE delivery Friday, March 1", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Apple--4", + "require_login": false, + "storage_state": null, + "start_url": "https://www.apple.com/", + "geolocation": null, + "intent_template": "How much does it cost to buy a Macbook pro, 16-inch, Apple M3 Max chip with 16-core CPU, 40-core GPU, 64GB unified memory, 1TB SSD.", + "instantiation_dict": {}, + "intent": "How much does it cost to buy a Macbook pro, 16-inch, Apple M3 Max chip with 16-core CPU, 40-core GPU, 64GB unified memory, 1TB SSD.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "$4,199.00 or $349.91/mo.per month for 12 mo.*", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Apple--14", + "require_login": false, + "storage_state": null, + "start_url": "https://www.apple.com/", + "geolocation": null, + "intent_template": "Identify the upgrade options available for the cheapest base model of the MacBook Pro 14-inch with M3 chip, and calculate the total price difference from the base model to the maximum upgrade (no Pre-Installed Software) offered by Apple.", + "instantiation_dict": {}, + "intent": "Identify the upgrade options available for the cheapest base model of the MacBook Pro 14-inch with M3 chip, and calculate the total price difference from the base model to the maximum upgrade (no Pre-Installed Software) offered by Apple.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Base model:$1599, difference: $1020", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Apple--24", + "require_login": false, + "storage_state": null, + "start_url": "https://www.apple.com/", + "geolocation": null, + "intent_template": "Find out the starting price for the most recent model of the iMac on the Apple website.", + "instantiation_dict": {}, + "intent": "Find out the starting price for the most recent model of the iMac on the Apple website.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "$1299.00", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Apple--34", + "require_login": false, + "storage_state": null, + "start_url": "https://www.apple.com/", + "geolocation": null, + "intent_template": "Identify the size and weight for the Apple TV 4K and list the Siri Remote features introduced.", + "instantiation_dict": {}, + "intent": "Identify the size and weight for the Apple TV 4K and list the Siri Remote features introduced.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Height: 1.2 inches (31 mm), Width: 3.66 inches (93 mm), Depth: 3.66 inches (93 mm); Siri Remote features", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "ArXiv--1", + "require_login": false, + "storage_state": null, + "start_url": "https://arxiv.org/", + "geolocation": null, + "intent_template": "Search for the latest research papers on quantum computing submitted to ArXiv in the last 2 days.", + "instantiation_dict": {}, + "intent": "Search for the latest research papers on quantum computing submitted to ArXiv in the last 2 days.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Paper related to quantum computing (latest 2 days)", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "ArXiv--11", + "require_login": false, + "storage_state": null, + "start_url": "https://arxiv.org/", + "geolocation": null, + "intent_template": "For Non-English submissions, do I need to provide a multi-language abstract, if need, answer the separator between the multiple abstracts.", + "instantiation_dict": {}, + "intent": "For Non-English submissions, do I need to provide a multi-language abstract, if need, answer the separator between the multiple abstracts.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "-----", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "ArXiv--21", + "require_login": false, + "storage_state": null, + "start_url": "https://arxiv.org/", + "geolocation": null, + "intent_template": "Search for papers on 'neural networks for image processing' in the Computer Science category on ArXiv and report how many were submitted in the last week.", + "instantiation_dict": {}, + "intent": "Search for papers on 'neural networks for image processing' in the Computer Science category on ArXiv and report how many were submitted in the last week.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "cs paper related to 'neural networks for image processing',", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "ArXiv--31", + "require_login": false, + "storage_state": null, + "start_url": "https://arxiv.org/", + "geolocation": null, + "intent_template": "Search ArXiv for papers with 'Graph Neural Networks' in the abstract that were submitted between Jan 1, 2024, and Jan 3, 2024, and determine how many of these papers have more than five authors.", + "instantiation_dict": {}, + "intent": "Search ArXiv for papers with 'Graph Neural Networks' in the abstract that were submitted between Jan 1, 2024, and Jan 3, 2024, and determine how many of these papers have more than five authors.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "7 papers", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "ArXiv--41", + "require_login": false, + "storage_state": null, + "start_url": "https://arxiv.org/", + "geolocation": null, + "intent_template": "Find the button to share arxiv non-profit store and follow the QR code to share the shop. Then add arXiv Forever short sleeve (XL) to your cart.", + "instantiation_dict": {}, + "intent": "Find the button to share arxiv non-profit store and follow the QR code to share the shop. Then add arXiv Forever short sleeve (XL) to your cart.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "QR code image, Action: add to chart", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "BBC News--8", + "require_login": false, + "storage_state": null, + "start_url": "https://www.bbc.com/news/", + "geolocation": null, + "intent_template": "Get a brief overview of the economic implications of the UK's latest trade deal posted on BBC News and the date when the article was published.", + "instantiation_dict": {}, + "intent": "Get a brief overview of the economic implications of the UK's latest trade deal posted on BBC News and the date when the article was published.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "CPTPP trade deal, <summary>; 16th July 2023", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "BBC News--18", + "require_login": false, + "storage_state": null, + "start_url": "https://www.bbc.com/news/", + "geolocation": null, + "intent_template": "Visit BBC News Audio, What are the best PodCasts for 2023? List 2 of them.", + "instantiation_dict": {}, + "intent": "Visit BBC News Audio, What are the best PodCasts for 2023? List 2 of them.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "2 of them: Believe in Magic, The Gift, Vishal, A Very British Cult, People Who Knew Me, History's Secret Heroes", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "BBC News--28", + "require_login": false, + "storage_state": null, + "start_url": "https://www.bbc.com/news/", + "geolocation": null, + "intent_template": "Find the Market Data section on BBC News and tell me which company the data comes from.", + "instantiation_dict": {}, + "intent": "Find the Market Data section on BBC News and tell me which company the data comes from.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Business - Market Data, Source: Morningstar", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "BBC News--38", + "require_login": false, + "storage_state": null, + "start_url": "https://www.bbc.com/news/", + "geolocation": null, + "intent_template": "Find news related to the storm in Weather section and indicate where and when the severe weather occurred.", + "instantiation_dict": {}, + "intent": "Find news related to the storm in Weather section and indicate where and when the severe weather occurred.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Earth - Weather & Science, article about severe weather, eg, You can't hear it, but this sound can reveal that a tornado is on its way", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Booking--6", + "require_login": false, + "storage_state": null, + "start_url": "https://www.booking.com/", + "geolocation": null, + "intent_template": "Book one room which provides breakfast, and airport shuttle from Jan 22 to 25 in Los Angeles.", + "instantiation_dict": {}, + "intent": "Book one room which provides breakfast, and airport shuttle from Jan 22 to 25 in Los Angeles.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "La Quinta by Wyndham LAX", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Booking--16", + "require_login": false, + "storage_state": null, + "start_url": "https://www.booking.com/", + "geolocation": null, + "intent_template": "Look for a hotel in Paris with a user rating of 9 or higher and available for a 5-night stay starting January 15, 2024. The hotel should also offer free Wi-Fi and breakfast included in the price. Provide the name, location, and price per night.", + "instantiation_dict": {}, + "intent": "Look for a hotel in Paris with a user rating of 9 or higher and available for a 5-night stay starting January 15, 2024. The hotel should also offer free Wi-Fi and breakfast included in the price. Provide the name, location, and price per night.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Zoku Paris; 48 Avenue de la Porte de Clichy, 17th arr., Paris; US$210 per night", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Booking--26", + "require_login": false, + "storage_state": null, + "start_url": "https://www.booking.com/", + "geolocation": null, + "intent_template": "Check Booking.com for a 3-star hotel or higher in Paris with a guest rating above 8.0 and available parking for dates February 20-23, 2024.", + "instantiation_dict": {}, + "intent": "Check Booking.com for a 3-star hotel or higher in Paris with a guest rating above 8.0 and available parking for dates February 20-23, 2024.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "InterContinental Paris Le Grand, an IHG Hotel, US$2208, 8.6 ratings, 5-star, parking", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Booking--36", + "require_login": false, + "storage_state": null, + "start_url": "https://www.booking.com/", + "geolocation": null, + "intent_template": "Search for a budget hotel in Rome under $100 per night for one adult from March 20 to March 23, 2024. Sort the results by price, identify if any of top three results offer breakfast.", + "instantiation_dict": {}, + "intent": "Search for a budget hotel in Rome under $100 per night for one adult from March 20 to March 23, 2024. Sort the results by price, identify if any of top three results offer breakfast.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "ROMA GONDOLA SRLS, US$81, no breakfast", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Cambridge Dictionary--2", + "require_login": false, + "storage_state": null, + "start_url": "https://dictionary.cambridge.org/", + "geolocation": null, + "intent_template": "Look up the pronunciation, definition, and example sentence for the word \"ubiquitous\" in UK and US English.", + "instantiation_dict": {}, + "intent": "Look up the pronunciation, definition, and example sentence for the word \"ubiquitous\" in UK and US English.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "UK: /ju\u02d0\u02c8b\u026ak.w\u026a.t\u0259s/, US: /ju\u02d0\u02c8b\u026ak.w\u0259.t\u032c\u0259s/; seeming to be everywhere; Leather is very much in fashion this season, as is the ubiquitous denim.", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Cambridge Dictionary--12", + "require_login": false, + "storage_state": null, + "start_url": "https://dictionary.cambridge.org/", + "geolocation": null, + "intent_template": "Find the pronunciation, definition, and a sample sentence for the word \"resilience\" in the Cambridge Dictionary.", + "instantiation_dict": {}, + "intent": "Find the pronunciation, definition, and a sample sentence for the word \"resilience\" in the Cambridge Dictionary.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "UK: /r\u026a\u02c8z\u026al.j\u0259ns/, US: /r\u026a\u02c8z\u026al.j\u0259ns/; the ability to be happy, successful, etc. again after something difficult or bad has happened; Trauma researchers emphasize the resilience of the human psyche.", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Cambridge Dictionary--22", + "require_login": false, + "storage_state": null, + "start_url": "https://dictionary.cambridge.org/", + "geolocation": null, + "intent_template": "Use the Cambridge Dictionary to find the definition, UK pronunciation, and an example sentence for the word \"quintessential.\"", + "instantiation_dict": {}, + "intent": "Use the Cambridge Dictionary to find the definition, UK pronunciation, and an example sentence for the word \"quintessential.\"", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "UK: /\u02cckw\u026an.t\u026a\u02c8sen.\u0283\u0259l/, US:/\u02cckw\u026an.t\u026a\u02c8sen.\u0283\u0259l/; Def: being the most typical example or most important part of something; Sheep's milk cheese is the quintessential Corsican cheese.", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Cambridge Dictionary--32", + "require_login": false, + "storage_state": null, + "start_url": "https://dictionary.cambridge.org/", + "geolocation": null, + "intent_template": "Search for the differences between \"fewer\" and \"less\" in grammar section, and provide examples illustrating their correct usage from the Cambridge Dictionary.", + "instantiation_dict": {}, + "intent": "Search for the differences between \"fewer\" and \"less\" in grammar section, and provide examples illustrating their correct usage from the Cambridge Dictionary.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Article: 'Less or fewer?'; I do less work at weekends than I used to; Better cycle routes would mean fewer cars and fewer accidents.", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Cambridge Dictionary--42", + "require_login": false, + "storage_state": null, + "start_url": "https://dictionary.cambridge.org/", + "geolocation": null, + "intent_template": "Convert the Cambridge Dictionary homepage from English (UK) to Deutsch.", + "instantiation_dict": {}, + "intent": "Convert the Cambridge Dictionary homepage from English (UK) to Deutsch.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Action: Click English (UK), change language to: Deutsch", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Coursera--9", + "require_login": false, + "storage_state": null, + "start_url": "https://www.coursera.org/", + "geolocation": null, + "intent_template": "Identify a Coursera course on artificial intelligence ethics that has a duration of less than 20 hours to complete and has been rated 4+ stars by participants.", + "instantiation_dict": {}, + "intent": "Identify a Coursera course on artificial intelligence ethics that has a duration of less than 20 hours to complete and has been rated 4+ stars by participants.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Artificial Intelligence: Ethics & Societal Challenges", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Coursera--19", + "require_login": false, + "storage_state": null, + "start_url": "https://www.coursera.org/", + "geolocation": null, + "intent_template": "Identify a course on Coursera that provides an introduction to Psychology, list the instructor's name, the institution offering it, and how many hours it will approximately take to complete.", + "instantiation_dict": {}, + "intent": "Identify a course on Coursera that provides an introduction to Psychology, list the instructor's name, the institution offering it, and how many hours it will approximately take to complete.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Instructor: Paul Bloom; Yale University; 14 hours", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Coursera--29", + "require_login": false, + "storage_state": null, + "start_url": "https://www.coursera.org/", + "geolocation": null, + "intent_template": "Browse the Coursera website and find the price required for one year of Coursera Plus. How much is the discount? Then list 3 companies that work with Coursera.", + "instantiation_dict": {}, + "intent": "Browse the Coursera website and find the price required for one year of Coursera Plus. How much is the discount? Then list 3 companies that work with Coursera.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "$399/year, discount: 59 / month * 12 - 399 = 309; Google, IBM, and Imperial College London ...", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Coursera--39", + "require_login": false, + "storage_state": null, + "start_url": "https://www.coursera.org/", + "geolocation": null, + "intent_template": "Find the Space Safety course offered by TUM on Coursera. How many videos are there in module 2? What is the name of each video?", + "instantiation_dict": {}, + "intent": "Find the Space Safety course offered by TUM on Coursera. How many videos are there in module 2? What is the name of each video?", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "6 videos; Introduction; Space Debris; Mitigation; Measurements; Protection; Atmospheric Re-entry", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "ESPN--7", + "require_login": false, + "storage_state": null, + "start_url": "https://www.espn.com/", + "geolocation": null, + "intent_template": "Retrieve the final score and a brief summary of the latest NBA game played by the Los Angeles Lakers as reported on ESPN.", + "instantiation_dict": {}, + "intent": "Retrieve the final score and a brief summary of the latest NBA game played by the Los Angeles Lakers as reported on ESPN.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "<score> (latest, Los Angeles Lakers vs xxx); <summary>", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "ESPN--17", + "require_login": false, + "storage_state": null, + "start_url": "https://www.espn.com/", + "geolocation": null, + "intent_template": "Check out the NBA Basketball Power Index 2023-24 to see which teams are in first place and which are in last place.", + "instantiation_dict": {}, + "intent": "Check out the NBA Basketball Power Index 2023-24 to see which teams are in first place and which are in last place.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Boston Celtics; San Antonio Spurs", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "ESPN--27", + "require_login": false, + "storage_state": null, + "start_url": "https://www.espn.com/", + "geolocation": null, + "intent_template": "Search on ESPN for how many teams have 'Golden' in their name and how many of them are in the NHL.", + "instantiation_dict": {}, + "intent": "Search on ESPN for how many teams have 'Golden' in their name and how many of them are in the NHL.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "30 teams in search results, 1 team Vegas Golden Knights (NHL)", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "ESPN--37", + "require_login": false, + "storage_state": null, + "start_url": "https://www.espn.com/", + "geolocation": null, + "intent_template": "Check out LeBron James' Stats to see how many games he has played in his career so far.", + "instantiation_dict": {}, + "intent": "Check out LeBron James' Stats to see how many games he has played in his career so far.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "1471", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "GitHub--3", + "require_login": false, + "storage_state": null, + "start_url": "https://github.com/", + "geolocation": null, + "intent_template": "Find out how much more package storage the Enterprise version has over Team in GitHub Pricing.", + "instantiation_dict": {}, + "intent": "Find out how much more package storage the Enterprise version has over Team in GitHub Pricing.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "48GB", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "GitHub--13", + "require_login": false, + "storage_state": null, + "start_url": "https://github.com/", + "geolocation": null, + "intent_template": "Identify the latest top-trending open-source project in the category of 'Machine Learning' on GitHub, and check the number of stars it has received.", + "instantiation_dict": {}, + "intent": "Identify the latest top-trending open-source project in the category of 'Machine Learning' on GitHub, and check the number of stars it has received.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "microsoft/ML-For-Beginners", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "GitHub--23", + "require_login": false, + "storage_state": null, + "start_url": "https://github.com/", + "geolocation": null, + "intent_template": "Find the wiki page of ohmyzsh on GitHub and tell me how to change the theme of zsh to agnoster.", + "instantiation_dict": {}, + "intent": "Find the wiki page of ohmyzsh on GitHub and tell me how to change the theme of zsh to agnoster.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "edit the .zshrc file and set the ZSH_THEME variable to \"agnoster\"", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "GitHub--33", + "require_login": false, + "storage_state": null, + "start_url": "https://github.com/", + "geolocation": null, + "intent_template": "Find Customer Stories on the GitHub page and list the 2 stories that appear on the web page.", + "instantiation_dict": {}, + "intent": "Find Customer Stories on the GitHub page and list the 2 stories that appear on the web page.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Philips builds and deploys digital health technology faster with innersource on GitHub. Shopify keeps pushing eCommerce forward with help from GitHub tools.", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Flights--2", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/travel/flights/", + "geolocation": null, + "intent_template": "Find the lowest fare from all eligible one-way flights for 1 adult from JFK to Heathrow on Jan. 22.", + "instantiation_dict": {}, + "intent": "Find the lowest fare from all eligible one-way flights for 1 adult from JFK to Heathrow on Jan. 22.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Tap Air Portugal 10:00\u202fPM \u2013 5:30\u202fPM(+1), $355 (real-time)", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Flights--12", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/travel/flights/", + "geolocation": null, + "intent_template": "Find the best-priced round-trip flight from New York to London leaving on December 25, 2023, and returning on January 5, 2024, with one stop or fewer.", + "instantiation_dict": {}, + "intent": "Find the best-priced round-trip flight from New York to London leaving on December 25, 2023, and returning on January 5, 2024, with one stop or fewer.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Norse Atlantic UK, 6:10\u202fPM \u2013 6:00\u202fAM(+1), $757, Nonstop (real-time)", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Flights--22", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/travel/flights/", + "geolocation": null, + "intent_template": "Find a round-trip flight from Rio de Janeiro to Los Angeles, leaving on March 15, 2024, and returning on March 22, 2024, and select the option with the least carbon dioxide emissions.", + "instantiation_dict": {}, + "intent": "Find a round-trip flight from Rio de Janeiro to Los Angeles, leaving on March 15, 2024, and returning on March 22, 2024, and select the option with the least carbon dioxide emissions.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Gol, Aeromexico, 7:00\u202fAM \u2013 10:22\u202fPM, 746 kg CO2", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Flights--32", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/travel/flights/", + "geolocation": null, + "intent_template": "Search for round-trip flights from Stockholm to Toronto, departing on March 3, 2024, and returning on March 10, 2024, and sort the results to find the shortest total travel time.", + "instantiation_dict": {}, + "intent": "Search for round-trip flights from Stockholm to Toronto, departing on March 3, 2024, and returning on March 10, 2024, and sort the results to find the shortest total travel time.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Icelandair, 12:50\u202fPM \u2013 6:15\u202fPM, 11 hr 25 min", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Map--0", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/maps/", + "geolocation": null, + "intent_template": "Find 5 beauty salons with ratings greater than 4.8 in Seattle, WA.", + "instantiation_dict": {}, + "intent": "Find 5 beauty salons with ratings greater than 4.8 in Seattle, WA.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Beehive Salon, Intermezzo Salon & Spa, Cindy's Beauty Salon, The Red Chair Salon, Ella and Oz Salon", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Map--10", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/maps/", + "geolocation": null, + "intent_template": "Search for a park in the state of California called Castle Mountains National Monument and find out it's Basic Information.", + "instantiation_dict": {}, + "intent": "Search for a park in the state of California called Castle Mountains National Monument and find out it's Basic Information.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "located in Barstow, CA 92311; open 24 hours; phone number is (760) 252-6100", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Map--20", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/maps/", + "geolocation": null, + "intent_template": "Find Tesla Destination Charger closest to the National Air and Space Museum.", + "instantiation_dict": {}, + "intent": "Find Tesla Destination Charger closest to the National Air and Space Museum.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Tesla Destination Charger, 1330 Maryland Ave SW, Washington, DC 20024", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Map--30", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/maps/", + "geolocation": null, + "intent_template": "Locate a parking lot near the Brooklyn Bridge that open 24 hours. Review the user comments about it.", + "instantiation_dict": {}, + "intent": "Locate a parking lot near the Brooklyn Bridge that open 24 hours. Review the user comments about it.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "2-68 Division St Garage, <reviews>", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Map--40", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/maps/", + "geolocation": null, + "intent_template": "Find a restaurant in Boston that eats Boston lobster and asks for a rating of 4.6 or higher, and check out what a one-star review says.", + "instantiation_dict": {}, + "intent": "Find a restaurant in Boston that eats Boston lobster and asks for a rating of 4.6 or higher, and check out what a one-star review says.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Boston Sail Loft, 4.6; one star review: Not sure about the rest of the seafood here since I left immediately after trying their AWFUL Chowder. I won't call it clam chowder since I didn't see a single piece of clam. This stuff was more like if you heated up half & Half then sprinkle dill and salt in it. It's too bad the tourist think this is how it's supposed to taste.", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Search--9", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/", + "geolocation": null, + "intent_template": "Show the rating of Prometheus movie on IMDb and Rotten Tomatoes.", + "instantiation_dict": {}, + "intent": "Show the rating of Prometheus movie on IMDb and Rotten Tomatoes.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "IMDb 7.0/10, Rotten Tomatoes 73%", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Search--19", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/", + "geolocation": null, + "intent_template": "What are the first 7 bits of the SHA of the Bert's latest commit on GitHub, and what exactly was changed in that commit.", + "instantiation_dict": {}, + "intent": "What are the first 7 bits of the SHA of the Bert's latest commit on GitHub, and what exactly was changed in that commit.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "eedf571, Smaller BERT Models", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Search--29", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/", + "geolocation": null, + "intent_template": "Find out the current world record for the men's 100m sprint.", + "instantiation_dict": {}, + "intent": "Find out the current world record for the men's 100m sprint.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "9.58s held by Usain Bolt of Jamaica", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Google Search--39", + "require_login": false, + "storage_state": null, + "start_url": "https://www.google.com/", + "geolocation": null, + "intent_template": "Identify the top-10 trending travel destination for 2024 through a blog, how many of them are in Asian.", + "instantiation_dict": {}, + "intent": "Identify the top-10 trending travel destination for 2024 through a blog, how many of them are in Asian.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "Tokyo, Japan; Seoul, South Korea; Halong Bay, Vietnam; Palawan Island, Philippines; Sapa, Vietnam; Bogota, Colombia; Pattaya, Thailand; Alajuela, Costa Rica; Phnom Penh, Cambodia; Kuala Lumpur, Malaysia. Asian: Tokyo, Japan; Seoul, South Korea; Halong Bay, Vietnam; Palawan Island, Philippines; Sapa, Vietnam; Kuala Lumpur, Malaysia; Phnom Penh, Cambodia", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Huggingface--6", + "require_login": false, + "storage_state": null, + "start_url": "https://huggingface.co/", + "geolocation": null, + "intent_template": "Find the model sentence-transformers/all-MiniLM-L6-v2 and use the Inference API on the webpage to get the similarity of the following two sentences: 'Tomorrow is Sunday', 'Eat a burger on Sunday'.", + "instantiation_dict": {}, + "intent": "Find the model sentence-transformers/all-MiniLM-L6-v2 and use the Inference API on the webpage to get the similarity of the following two sentences: 'Tomorrow is Sunday', 'Eat a burger on Sunday'.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "0.550", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Huggingface--16", + "require_login": false, + "storage_state": null, + "start_url": "https://huggingface.co/", + "geolocation": null, + "intent_template": "Find information on the latest (as of today's date) pre-trained language model on Huggingface suitable for text classification and briefly describe its intended use case and architecture.", + "instantiation_dict": {}, + "intent": "Find information on the latest (as of today's date) pre-trained language model on Huggingface suitable for text classification and briefly describe its intended use case and architecture.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "<model> (today, text classification)", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Huggingface--26", + "require_login": false, + "storage_state": null, + "start_url": "https://huggingface.co/", + "geolocation": null, + "intent_template": "Identify a model on Hugging Face designed for generating travel chats. Obtain information about the model, including its name, size and training framwork.", + "instantiation_dict": {}, + "intent": "Identify a model on Hugging Face designed for generating travel chats. Obtain information about the model, including its name, size and training framwork.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "PhilipTheGreat/DiabloGPT-small-Traveller, GPT2LMHeadModel, 510 MB", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Huggingface--36", + "require_login": false, + "storage_state": null, + "start_url": "https://huggingface.co/", + "geolocation": null, + "intent_template": "Summarize all the payment plans and their advantages in huggingface pricing.", + "instantiation_dict": {}, + "intent": "Summarize all the payment plans and their advantages in huggingface pricing.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "summary of https://huggingface.co/pricing", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Wolfram Alpha--3", + "require_login": false, + "storage_state": null, + "start_url": "https://www.wolframalpha.com/", + "geolocation": null, + "intent_template": "Let g(x) be the integral of x^2 cos(2x). Write the expression of g(x) with solution.", + "instantiation_dict": {}, + "intent": "Let g(x) be the integral of x^2 cos(2x). Write the expression of g(x) with solution.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "1/4 (2 x cos(2 x) + (-1 + 2 x^2) sin(2 x)) + Constant", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Wolfram Alpha--13", + "require_login": false, + "storage_state": null, + "start_url": "https://www.wolframalpha.com/", + "geolocation": null, + "intent_template": "What is 10,000 US dollars in 1980 and in 1970 Worth today?", + "instantiation_dict": {}, + "intent": "What is 10,000 US dollars in 1980 and in 1970 Worth today?", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "approximately: 36430; 77325", + "type": "possible" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Wolfram Alpha--23", + "require_login": false, + "storage_state": null, + "start_url": "https://www.wolframalpha.com/", + "geolocation": null, + "intent_template": "Calculate the population growth rate of Canada from 2020 to 2023 using Wolfram Alpha.", + "instantiation_dict": {}, + "intent": "Calculate the population growth rate of Canada from 2020 to 2023 using Wolfram Alpha.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "mean population growth rate of Canada from 2020 to 2023 is 0.9998% per year", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Wolfram Alpha--33", + "require_login": false, + "storage_state": null, + "start_url": "https://www.wolframalpha.com/", + "geolocation": null, + "intent_template": "Identify the electrical energy output of a hydroelectric power plant named Itaipu Dam in 2023 using Wolfram Alpha.", + "instantiation_dict": {}, + "intent": "Identify the electrical energy output of a hydroelectric power plant named Itaipu Dam in 2023 using Wolfram Alpha.", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "89.5 TWh (terawatt hours)", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + }, + { + "sites": null, + "task_id": "Wolfram Alpha--43", + "require_login": false, + "storage_state": null, + "start_url": "https://www.wolframalpha.com/", + "geolocation": null, + "intent_template": "A polyominoes of order 6 means you have 6 identical squares to combine different shapes (2-sided). How many combinations are there? Looking at all the shapes in the result, how many of them have only 2 rows in total?", + "instantiation_dict": {}, + "intent": "A polyominoes of order 6 means you have 6 identical squares to combine different shapes (2-sided). How many combinations are there? Looking at all the shapes in the result, how many of them have only 2 rows in total?", + "require_reset": false, + "eval": { + "eval_types": [ + "manual" + ], + "reference_answers": { + "manual_check": { + "answer": "35; 12", + "type": "golden" + } + }, + "reference_url": null, + "program_html": null + } + } +] \ No newline at end of file diff --git a/test/tasks/webvoyager_test.json b/test/tasks/webvoyager_test.json index d730714..cede577 100644 --- a/test/tasks/webvoyager_test.json +++ b/test/tasks/webvoyager_test.json @@ -6810,9 +6810,9 @@ "storage_state": null, "start_url": "https://www.booking.com/", "geolocation": null, - "intent_template": "Look up Vienna hotel options with availability for a 4-night stay from September 28 to October 4, 2024, with amenities that include a Parking, breakfast included, and a rating of 8+ on Booking.com.", + "intent_template": "Look up Vienna hotel options with availability for a 4-night stay from September 28 to October 2, 2024, with amenities that include a Parking, breakfast included, and a rating of 8+ on Booking.com.", "instantiation_dict": {}, - "intent": "Look up Vienna hotel options with availability for a 4-night stay from September 28 to October 4, 2024, with amenities that include a Parking, breakfast included, and a rating of 8+ on Booking.com.", + "intent": "Look up Vienna hotel options with availability for a 4-night stay from September 28 to October 2, 2024, with amenities that include a Parking, breakfast included, and a rating of 8+ on Booking.com.", "require_reset": false, "eval": { "eval_types": [ diff --git a/test/tests_processor.py b/test/tests_processor.py index 222f7d4..64b13b9 100644 --- a/test/tests_processor.py +++ b/test/tests_processor.py @@ -14,6 +14,7 @@ from ae.core.autogen_wrapper import AutogenWrapper from ae.core.playwright_manager import PlaywrightManager from ae.utils.logger import logger +from ae.utils.response_parser import parse_response from autogen.agentchat.chat import ChatResult # type: ignore from playwright.async_api import Page from tabulate import tabulate @@ -97,11 +98,19 @@ def save_individual_test_result(test_result: dict[str, str | int | float | None] def extract_last_response(messages: list[dict[str, Any]]) -> str: """Extract the last response message from chat history.""" - # Iterate over the messages in reverse order - for message in reversed(messages): - if '##TERMINATE##' in message.get('content', ''): - return message['content'].replace("##TERMINATE##", "").strip() - return "" + try: + # Iterate over the messages in reverse order + for message in reversed(messages): + if message and 'content' in message: + content=message.get('content', "") + content_json = parse_response(content) + final_answer = content_json.get('final_response', None) + if final_answer: + return final_answer + return "" + except: + logger.error("Error extracting last response from chat history.") + return "" def print_progress_bar(current: int, total: int, bar_length: int = 50) -> None: @@ -321,9 +330,7 @@ async def run_tests(ag: AutogenWrapper, browser_manager: PlaywrightManager, min_ browser_manager = browserManager.PlaywrightManager(headless=False) await browser_manager.async_initialize() - context = await browser_manager.get_browser_context() - page = await context.new_page() # type: ignore - + page=await browser_manager.get_current_page() test_results = [] max_task_index = len(test_configurations) if not max_task_index else max_task_index total_tests = max_task_index - min_task_index