Download all the files from a directory listing such as https://www.ndbc.noaa.gov/data/ocean/.
npm install scrape-directory-listing
yarn add scrape-directory-listing
pnpm add scrape-directory-listing
π Hello there! Follow me @linesofcode or visit linesofcode.dev for more cool projects like this one.
This example will recursively download all the files from https://www.ndbc.noaa.gov/data/ocean/.
import { scrapeDirectoryListing } from 'scrape-directory-listing';
const res = await scrapeDirectoryListing({
url: 'https://www.ndbc.noaa.gov/data/ocean',
});
The response will contain an array of objects with the following properties:
{
item: {
description: string;
modifiedAt: number;
name: string;
path: string;
size: number | null;
type: 'file' | 'directory';
},
data: ArrayBuffer;
headers: Headers;
}
import { scrapeDirectoryListing } from 'scrape-directory-listing';
import { writeFile } from 'fs/promises';
const res = await scrapeDirectoryListing({
url: 'https://www.ndbc.noaa.gov/data/ocean',
});
const first = res[0];
await writeFile('output/' + first.item.name, Buffer.from(first.data));
You can pass fetch function and combine with custom logic, for example to control current number of concurrent requests.
import { scrapeDirectoryListing } from 'scrape-directory-listing';
import pLimit from 'p-limit';
// Limit to 1 request at a time
const limit = pLimit(1);
const res = await scrapeDirectoryListing({
url: 'https://www.ndbc.noaa.gov/data/ocean',
fetchFileFn: async (item) => {
return limit(() => fetch(item.url));
},
});