Web Crawler Project

Web Crawler in JavaScript using Node.js! This application generates an "internal links" report for any website on the internet by crawling each page of the site.

Preview

Getting Started

Prerequisites :

Make sure you have the following installed on your machine:

node : The JavaScript runtime. This will let us run the JavaScript files. Install Node.js version 18.0 or higher.
npm : The package manager. This manages dependencies, metadata, and allows to specify "scripts" to run.

The package.json file is created during npm init, and will contain the script run that runs main.js using node.

Installation

Clone the repository:

git clone https://github.com/Abe-alt/web-crawler.git

Navigate to the project directory:
```
cd web-crawler
```

Running the Crawler

To start the web crawler, run the following command:
npm run start website_to_crawl

Features

These are the main functions used in the programm :

normalizeURL() compare URLs to see if they are the same page.
getURLsFromHTML(): takes a string of HTML as input and returns a list of all the link URLs using a third-party HTML parsing library JSDOM
crawlPage() : fetch the webpage of the currentURL in a recursive way until we've crawled every page on the site
printReport(pages) : convert the pages object into a report and log it to the console.
main() : fix the number of CLI arguments at 1, which is the base_url

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
.nvmrc		.nvmrc
crawl.js		crawl.js
crawl.test.js		crawl.test.js
main.js		main.js
package-lock.json		package-lock.json
package.json		package.json
package111.json		package111.json
readme.md		readme.md
report.js		report.js
report.test.js		report.test.js
web-crawler-git.gif		web-crawler-git.gif
web-crawler-git.gif:Zone.Identifier		web-crawler-git.gif:Zone.Identifier

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Crawler Project

Preview

Getting Started

Prerequisites :

Installation

Running the Crawler

Features

About

Releases

Packages

Languages

Abe-alt/web-crawler

Folders and files

Latest commit

History

Repository files navigation

Web Crawler Project

Preview

Getting Started

Prerequisites :

Installation

Running the Crawler

Features

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages