Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #180

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Update README.md #180

wants to merge 1 commit into from

Conversation

bbarclay
Copy link

Running the gpt-crawler from an External Script

This example demonstrates how to use the core functionalities of the gpt-crawler package outside of its CLI interface by directly importing the module’s functions programmatically using Node.js. Since gpt-crawler is an ES module, we need to use dynamic imports in a CommonJS environment to ensure it works seamlessly.

// test-direct-call.js (using dynamic import in CommonJS)
(async () => {
    try {
        // Dynamically import the ES module
        const { crawl, write } = await import('./node_modules/@builder.io/gpt-crawler/dist/src/core.js');

        // Define your custom configuration for the crawl
        const config = {
            url: "https://example.com",
            match: "/articles/",
            selector: "h1",
            maxPagesToCrawl: 10,
            outputFileName: "output.json",
            maxTokens: 5000,   // Optional for token limit logic
            maxFileSize: 5,    // Maximum file size in MB
        };

        // Call the crawl function directly from the core.js file
        console.log("Starting crawl...");
        await crawl(config);
        console.log("Crawl complete.");

        // Call the write function to store results
        console.log("Writing output...");
        await write(config);
        console.log("Output written to:", config.outputFileName);

    } catch (error) {
        console.error("An error occurred:", error.message);
    }
})();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant