🕸️ @isdk/web-fetcher

An AI-friendly web automation library that simplifies complex web interactions into a declarative JSON action script. Write your script once and run it in either a fast http mode for static content or a full browser mode for dynamic sites. An optional antibot flag helps bypass detection mechanisms. The library is designed for targeted, task-oriented data extraction (e.g., get X from page Y), not for building whole-site crawlers.

✨ Core Features

⚙️ Dual-Engine Architecture: Choose between http mode (powered by Cheerio) for speed on static sites, or browser mode (powered by Playwright) for full JavaScript execution on dynamic sites.
📜 Declarative Action Scripts: Define multi-step workflows (like logging in, filling forms, and clicking buttons) in a simple, readable JSON format.
📊 Powerful and Flexible Data Extraction: Easily extract all kinds of structured data, from simple text to complex nested objects, through an intuitive and powerful declarative Schema.
🧠 Smart Engine Selection: Automatically detects dynamic sites and can upgrade the engine from http to browser on the fly.
🧩 Extensible: Easily create custom, high-level "composite" actions to encapsulate reusable business logic (e.g., a login action).
🧲 Advanced Collectors: Asynchronously collect data in the background, triggered by events during the execution of a main action.
🛡️ Anti-Bot Evasion: In browser mode, an optional antibot flag helps to bypass common anti-bot measures like Cloudflare challenges.

📦 Installation

Install the Package:
```
npm install @isdk/web-fetcher
```
Install Browsers (For browser mode):

The browser engine is powered by Playwright, which requires separate browser binaries to be downloaded. If you plan to use the browser engine for interacting with dynamic websites, run the following command:
```
npx playwright install
```
ℹ️ Note: This step is only required for browser mode. The lightweight http mode works out of the box without this installation.

🚀 Quick Start

The following example fetches a web page and extracts its title.

import { fetchWeb } from '@isdk/web-fetcher';

async function getTitle(url: string) {
  const { outputs } = await fetchWeb({
    url,
    actions: [
      {
        id: 'extract',
        params: {
          // Extracts the text content of the <title> tag
          selector: 'title',
        },
        // Stores the result in the `outputs` object under the key 'pageTitle'
        storeAs: 'pageTitle',
      },
    ],
  });

  console.log('Page Title:', outputs.pageTitle);
}

getTitle('https://www.google.com');

🤖 Advanced Usage: Multi-Step Form Submission

This example demonstrates how to use the browser engine to perform a search on Google.

import { fetchWeb } from '@isdk/web-fetcher';

async function searchGoogle(query: string) {
  // Search for the query on Google
  const { result, outputs } = await fetchWeb({
    url: 'https://www.google.com',
    engine: 'browser', // Use the full browser engine for interaction
    actions: [
      // The initial navigation to google.com is handled by the `url` option
      { id: 'fill', params: { selector: 'textarea[name=q]', value: query } },
      { id: 'submit', params: { selector: 'form' } },
      { id: 'waitFor', params: { selector: '#search' } }, // Wait for the search results container to appear
      { id: 'getContent', storeAs: 'searchResultsPage' },
    ]
  });

  console.log('Search Results URL:', result?.finalUrl);
  console.log('Outputs contains the full page content:', outputs.searchResultsPage.html.substring(0, 100));
}

searchGoogle('gemini');

🏗️ Architecture

This library is built on two core concepts: Engines and Actions.

Engine Architecture

The library's core is its dual-engine design. It abstracts away the complexities of web interaction behind a unified API. For detailed information on the http (Cheerio) and browser (Playwright) engines, how they manage state, and how to extend them, please see the Fetch Engine Architecture document.
Action Architecture

All workflows are defined as a series of "Actions". The library provides a set of built-in atomic actions and a powerful composition model for creating your own semantic actions. For a deep dive into creating and using actions, see the Action Script Architecture document.

📚 API Reference

`fetchWeb(options)` or `fetchWeb(url, options)`

This is the main entry point for the library.

Key FetcherOptions:

url (string): The initial URL to navigate to.
engine ('http' | 'browser' | 'auto'): The engine to use. Defaults to auto.
actions (FetchActionOptions[]): An array of action objects to execute.
headers (Record<string, string>): Headers to use for all requests.
...and many other options for proxy, cookies, retries, etc.

Built-in Actions

Here are the essential built-in actions:

goto: Navigates to a new URL.
click: Clicks on an element specified by a selector.
fill: Fills an input field with a specified value.
submit: Submits a form.
waitFor: Pauses execution to wait for a specific condition (e.g., a timeout, a selector to appear, or network to be idle).
pause: Pauses execution for manual intervention (e.g., solving a CAPTCHA).
getContent: Retrieves the full content (HTML, text, etc.) of the current page state.
extract: Extracts any structured data from the page with ease using an expressive, declarative schema.

📜 License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
docs		docs
src		src
test		test
.eslintrc.yml		.eslintrc.yml
.gitignore		.gitignore
.npmignore		.npmignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
.versionrc		.versionrc
CHANGELOG.md		CHANGELOG.md
LICENSE-MIT		LICENSE-MIT
README.action.cn.md		README.action.cn.md
README.action.md		README.action.md
README.cn.md		README.cn.md
README.engine.cn.md		README.engine.cn.md
README.engine.md		README.engine.md
README.md		README.md
TODO		TODO
fetch-polyfill.mjs		fetch-polyfill.mjs
package.json		package.json
setupVitest.mjs		setupVitest.mjs
tsconfig.json		tsconfig.json
tsconfig.spec.json		tsconfig.spec.json
tsup.config.ts		tsup.config.ts
typedoc.config.cjs		typedoc.config.cjs
vite.config.mjs		vite.config.mjs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🕸️ @isdk/web-fetcher

✨ Core Features

📦 Installation

🚀 Quick Start

🤖 Advanced Usage: Multi-Step Form Submission

🏗️ Architecture

Engine Architecture

Action Architecture

📚 API Reference

`fetchWeb(options)` or `fetchWeb(url, options)`

Built-in Actions

📜 License

About

Uh oh!

Releases

Packages

Languages

License

isdk/web-fetcher.js

Folders and files

Latest commit

History

Repository files navigation

🕸️ @isdk/web-fetcher

✨ Core Features

📦 Installation

🚀 Quick Start

🤖 Advanced Usage: Multi-Step Form Submission

🏗️ Architecture

Engine Architecture

Action Architecture

📚 API Reference

fetchWeb(options) or fetchWeb(url, options)

Built-in Actions

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`fetchWeb(options)` or `fetchWeb(url, options)`

Packages