Skip to content

feat: add infinte scrolling #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 19, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions scrapegraph-js/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,30 @@ const schema = z.object({
})();
```

#### Scraping with Infinite Scrolling

For websites that load content dynamically through infinite scrolling (like social media feeds), you can use the `numberOfScrolls` parameter:

```javascript
import { smartScraper } from 'scrapegraph-js';

const apiKey = 'your-api-key';
const url = 'https://example.com/infinite-scroll-page';
const prompt = 'Extract all the posts from the feed';
const numberOfScrolls = 10; // Will scroll 10 times to load more content

(async () => {
try {
const response = await smartScraper(apiKey, url, prompt, null, numberOfScrolls);
console.log('Extracted data from scrolled page:', response);
} catch (error) {
console.error('Error:', error);
}
})();
```

The `numberOfScrolls` parameter accepts values between 0 and 100, allowing you to control how many times the page should be scrolled before extraction.

### Search Scraping

Search and extract information from multiple web sources using AI.
Expand Down
15 changes: 15 additions & 0 deletions scrapegraph-js/examples/smartScraper_infinite_scroll_example.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import { smartScraper } from 'scrapegraph-js';
import 'dotenv/config';

const apiKey = process.env.SGAI_APIKEY;
// Example URL that requires scrolling (e.g., a social media feed or infinite scroll page)
const url = 'https://example.com/infinite-scroll-page';
const prompt = 'Extract all the posts from the feed';
const numberOfScrolls = 10; // Will scroll 10 times to load more content

try {
const response = await smartScraper(apiKey, url, prompt, null, numberOfScrolls);
console.log('Extracted data from scrolled page:', response);
} catch (error) {
console.error('Error:', error);
}
10 changes: 9 additions & 1 deletion scrapegraph-js/src/smartScraper.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,11 @@ import { zodToJsonSchema } from 'zod-to-json-schema';
* @param {string} url - The URL of the webpage to scrape
* @param {string} prompt - Natural language prompt describing what data to extract
* @param {Object} [schema] - Optional schema object defining the output structure
* @param {number} [numberOfScrolls] - Optional number of times to scroll the page (0-100). If not provided, no scrolling will be performed.
* @returns {Promise<string>} Extracted data in JSON format matching the provided schema
* @throws - Will throw an error in case of an HTTP failure.
*/
export async function smartScraper(apiKey, url, prompt, schema = null) {
export async function smartScraper(apiKey, url, prompt, schema = null, numberOfScrolls = null) {
const endpoint = 'https://api.scrapegraphai.com/v1/smartscraper';
const headers = {
'accept': 'application/json',
Expand All @@ -34,6 +35,13 @@ export async function smartScraper(apiKey, url, prompt, schema = null) {
}
}

if (numberOfScrolls !== null) {
if (!Number.isInteger(numberOfScrolls) || numberOfScrolls < 0 || numberOfScrolls > 100) {
throw new Error('numberOfScrolls must be an integer between 0 and 100');
}
payload.number_of_scrolls = numberOfScrolls;
}

try {
const response = await axios.post(endpoint, payload, { headers });
return response.data;
Expand Down