AgentQL is an AI-powered query language for scraping web sites and automating workflows. It uses natural language queries to pinpoint data and elements on any web page, including authenticated and dynamically generated content. Users can define structured data output and apply transforms within queries. AgentQL's natural language selectors find elements intuitively based on the content of the web page and work across similar web sites, self-healing as UI changes over time.
- Python and Playwright AgentQL's Python SDK seamlessly integrates with Playwright for advanced automation and testing.
- Cross-site compatibility lets you use the same query across different sites with similar content.
- Structured output defined by the shape of your query.
- Natural language selectors find elements and data anywhere on a site using intuitive queries.
- Transforms and extracts data in your queries.
- Works on any page, public or private, any site, any URL, even behind authentication.
- Resiliance to UI changes means queries work regardless of how a page's structure changes over time.
- Python SDK for running automation and scraping scripts with AgentQL queries.
- Debugger Browser Extension lets you debug and finesse queries in real-time on live sites.
- AgentQL Query Language lets you define queries with natural language.
- Playground for playing with AgentQL lets you export python scripts and optimize queries with prompts.
- Install Python SDK and dependencies via your terminal:
pip3 install agentql
agentql init
-
Copy and paste your API key into the terminal.
-
Save one of the following scripts as example.py and run the following from your terminal:
python3 example.py
Data extraction with query_data
import agentql
from playwright.sync_api import sync_playwright
with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser:
page = agentql.wrap(browser.new_page())
page.goto("https://scrapeme.live/shop/")
# use your own words to describe what you're looking for
QUERY = """
{
products[] {
name
price
}
}
"""
# query_data returns data from the page
response = page.query_data(QUERY)
print(response)
Automation with get_by_prompt
and query_elements
import agentql
from playwright.sync_api import sync_playwright
with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser:
page = agentql.wrap(browser.new_page())
page.goto("https://duckduckgo.com")
# use your own words to describe what you're looking for
QUERY = """
{
search_box
search_button
}
"""
# query_elements returns multiple elements to perform operations on
response = page.query_elements(QUERY)
response.search_box.fill("AgentQL")
response.search_button.click()
# get_by_prompt returns one element to perform operations on based on the content you pass to it
images = page.get_by_prompt("images link")
images.click()
# Used only for demo purposes. It allows you to see the effect of the script.
page.wait_for_timeout(10000)
- Getting started with AgentQL
- Debug AgentQL script
- Run script in headless browser
- Run script with an external or existing browser
- Run script online in Google Colaboratory
- Compare product prices across different websites
- Save and reuse logged in state
- Wait for page to load
- Close popup windows (like promotion form)
- Close cookie dialog
- Leverage List Query
- Leverage get_by_prompt method
- Log into Site
For comprehensive guides and API references, check out our official documentation.
If you find AgentQL helpful, please consider giving us a star on GitHub! It helps us reach more developers and continue improving the project.
For questions, feedback, or support, join our Discord community. You can follow us on GitHub, Twitter, and LinkedIn!