English | 简体中文
An AI-friendly web automation library that simplifies complex web interactions into a declarative JSON action script. Write your script once and run it in either a fast
httpmode for static content or a fullbrowsermode for dynamic sites. An optionalantibotflag helps bypass detection mechanisms. The library is designed for targeted, task-oriented data extraction (e.g., get X from page Y), not for building whole-site crawlers.
- ⚙️ Dual-Engine Architecture: Choose between 
httpmode (powered by Cheerio) for speed on static sites, orbrowsermode (powered by Playwright) for full JavaScript execution on dynamic sites. - 📜 Declarative Action Scripts: Define multi-step workflows (like logging in, filling forms, and clicking buttons) in a simple, readable JSON format.
 - 📊 Powerful and Flexible Data Extraction: Easily extract all kinds of structured data, from simple text to complex nested objects, through an intuitive and powerful declarative Schema.
 - 🧠 Smart Engine Selection: Automatically detects dynamic sites and can upgrade the engine from 
httptobrowseron the fly. - 🧩 Extensible: Easily create custom, high-level "composite" actions to encapsulate reusable business logic (e.g., a 
loginaction). - 🧲 Advanced Collectors: Asynchronously collect data in the background, triggered by events during the execution of a main action.
 - 🛡️ Anti-Bot Evasion: In 
browsermode, an optionalantibotflag helps to bypass common anti-bot measures like Cloudflare challenges. 
- 
Install the Package:
npm install @isdk/web-fetcher
 - 
Install Browsers (For
browsermode):The
browserengine is powered by Playwright, which requires separate browser binaries to be downloaded. If you plan to use thebrowserengine for interacting with dynamic websites, run the following command:npx playwright install
ℹ️ Note: This step is only required for
browsermode. The lightweighthttpmode works out of the box without this installation. 
The following example fetches a web page and extracts its title.
import { fetchWeb } from '@isdk/web-fetcher';
async function getTitle(url: string) {
  const { outputs } = await fetchWeb({
    url,
    actions: [
      {
        id: 'extract',
        params: {
          // Extracts the text content of the <title> tag
          selector: 'title',
        },
        // Stores the result in the `outputs` object under the key 'pageTitle'
        storeAs: 'pageTitle',
      },
    ],
  });
  console.log('Page Title:', outputs.pageTitle);
}
getTitle('https://www.google.com');This example demonstrates how to use the browser engine to perform a search on Google.
import { fetchWeb } from '@isdk/web-fetcher';
async function searchGoogle(query: string) {
  // Search for the query on Google
  const { result, outputs } = await fetchWeb({
    url: 'https://www.google.com',
    engine: 'browser', // Use the full browser engine for interaction
    actions: [
      // The initial navigation to google.com is handled by the `url` option
      { id: 'fill', params: { selector: 'textarea[name=q]', value: query } },
      { id: 'submit', params: { selector: 'form' } },
      { id: 'waitFor', params: { selector: '#search' } }, // Wait for the search results container to appear
      { id: 'getContent', storeAs: 'searchResultsPage' },
    ]
  });
  console.log('Search Results URL:', result?.finalUrl);
  console.log('Outputs contains the full page content:', outputs.searchResultsPage.html.substring(0, 100));
}
searchGoogle('gemini');This library is built on two core concepts: Engines and Actions.
- 
The library's core is its dual-engine design. It abstracts away the complexities of web interaction behind a unified API. For detailed information on the
http(Cheerio) andbrowser(Playwright) engines, how they manage state, and how to extend them, please see the Fetch Engine Architecture document. - 
All workflows are defined as a series of "Actions". The library provides a set of built-in atomic actions and a powerful composition model for creating your own semantic actions. For a deep dive into creating and using actions, see the Action Script Architecture document.
 
This is the main entry point for the library.
Key FetcherOptions:
url(string): The initial URL to navigate to.engine('http' | 'browser' | 'auto'): The engine to use. Defaults toauto.actions(FetchActionOptions[]): An array of action objects to execute.headers(Record<string, string>): Headers to use for all requests.- ...and many other options for proxy, cookies, retries, etc.
 
Here are the essential built-in actions:
goto: Navigates to a new URL.click: Clicks on an element specified by a selector.fill: Fills an input field with a specified value.submit: Submits a form.waitFor: Pauses execution to wait for a specific condition (e.g., a timeout, a selector to appear, or network to be idle).pause: Pauses execution for manual intervention (e.g., solving a CAPTCHA).getContent: Retrieves the full content (HTML, text, etc.) of the current page state.extract: Extracts any structured data from the page with ease using an expressive, declarative schema.