GO-Crawl

WebCrawler + Sitemap.xml writer.

How to build a binary

Run the following commands to build a binary for your OS:

Linux: $ env GOOS=linux GOARCH=amd64 go build -o go-crawl

MacOS: $ env GOOS=darwin GOARCH=amd64 go build -o go-crawl

Windows: $ env GOOS=windows GOARCH=amd64 go build -o go-crawl.exe

Configuration

Near go-crawl binary must be a config.json file. There are array of objects into it.

Example:

[
  {
    "baseUrl": "http://localhost.local",
    "sitemapUrlPath": "/sitemaps/",
    "sitemapPath": "~/web/go-crawl/sitemap/",
    "getFrom": [
    ],
    "filterRules": [
      "^/about/press-centr/press-relize",
      "special_version",
      "^/upload",
      "^/document.php",
      "^/review/\\?",
      "\\.pdf",
      "\\.PDF"
    ]
  }
]

baseUrl - protocol + host.

sitemapUrlPath - used for generating sitemap index, when total links more then 5000

sitemapPath - absolute or relative path to where sitemap.xml will be written.

getFrom - array of urls, where to crawl for new urls.

filterRules - array of regexp rules to skip urls. (case sensetive)

Add User-Agent: GO-Crawl into the table b_stat_searcher to skip bitrix throttler: http://localhost.local/bitrix/admin/perfmon_table.php?lang=ru&table_name=b_stat_searcher

Run

$ go-crawl

And it will be done.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
config		config
crawler		crawler
logger		logger
utility		utility
.gitignore		.gitignore
example.config.json		example.config.json
go.mod		go.mod
go.sum		go.sum
main.go		main.go
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GO-Crawl

How to build a binary

Configuration

Run

About

Uh oh!

Releases

Packages

Uh oh!

Languages

vashe25/go-crawl

Folders and files

Latest commit

History

Repository files navigation

GO-Crawl

How to build a binary

Configuration

Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages