|
1 |
| -# Sentry |
2 |
| -### It watches stuff |
| 1 | +# Identity |
3 | 2 |
|
4 |
| -Sentry is a parallelized web crawler written in [Go](https://golang.org) that writes urls, links, & response headers to a Postgres database, then stores the response itself on amazon S3. It keeps a list of “sources”, which use simple string comparison to keep it from wandering outside of designated domains or url paths. |
| 3 | +[](http://github.com/datatogether) |
| 4 | +[](https://archivers-slack.herokuapp.com/) |
| 5 | +[](./LICENSE) |
| 6 | +[](https://codecov.io/gh/datatogether/identity) |
5 | 7 |
|
6 |
| -The big difference from other crawlers is a tunable “stale duration”, which will tell the crawler to capture an updated snapshot of the page if the time since the last GET request is older than the stale duration. This gives it a continual “watching” property. |
| 8 | +[1-3 sentence description of repository contents] |
7 | 9 |
|
8 |
| -Sentry holds a separate stream of scraping for any url that looks like a dataset. So when it encounters urls that look like `https://foo.com/file.csv`, it assumes that file ending may be a static asset, and places that url on a separate thread for archiving. |
| 10 | +## License & Copyright |
9 | 11 |
|
10 |
| -# Related Projects |
| 12 | +[Modelled on [project guidelines template](https://github.com/datatogether/roadmap/blob/master/PROJECT.md#license--copyright-readme-block) ] |
11 | 13 |
|
12 |
| -In parallel to building this tool, we have engaged in efforts to map the landscape of similar projects: |
| 14 | +## Getting Involved |
13 | 15 |
|
14 |
| -:eyes: See: [**Comparison of web archiving software**](https://github.com/datatogether/research/tree/master/web_archiving) |
| 16 | +We would love involvement from more people! If you notice any errors or would like to submit changes, please see our [Contributing Guidelines](./.github/CONTRIBUTING.md). |
| 17 | + |
| 18 | +We use GitHub issues for [tracking bugs and feature requests](https://github.com/datatogether/REPONAME/issues) and Pull Requests (PRs) for [submitting changes](https://github.com/datatogether/REPONAME/pulls) |
| 19 | + |
| 20 | +## ... |
| 21 | + |
| 22 | +## [Optional section(s) on Installation (actually using the service!), Architecture, Dependencies, and Other Considerations] |
| 23 | + |
| 24 | +[fill out this section if the repo contains deployable/installable code] |
| 25 | + |
| 26 | +## Development |
| 27 | + |
| 28 | +[Step-by-step instructions about how to set up a local dev environment and any dependencies] |
| 29 | + |
| 30 | +## Deployment |
| 31 | + |
| 32 | +[Optional section with deployment instructions] |
0 commit comments