Skip to content
This repository was archived by the owner on Nov 7, 2019. It is now read-only.

Commit 14fce16

Browse files
committed
revise readme
1 parent 158ca8b commit 14fce16

File tree

1 file changed

+26
-8
lines changed

1 file changed

+26
-8
lines changed

README.md

Lines changed: 26 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,32 @@
1-
# Sentry
2-
### It watches stuff
1+
# Identity
32

4-
Sentry is a parallelized web crawler written in [Go](https://golang.org) that writes urls, links, & response headers to a Postgres database, then stores the response itself on amazon S3. It keeps a list of “sources”, which use simple string comparison to keep it from wandering outside of designated domains or url paths.
3+
[![GitHub](https://img.shields.io/badge/project-Data_Together-487b57.svg?style=flat-square)](http://github.com/datatogether)
4+
[![Slack](https://img.shields.io/badge/slack-Archivers-b44e88.svg?style=flat-square)](https://archivers-slack.herokuapp.com/)
5+
[![License](https://img.shields.io/github/license/mashape/apistatus.svg)](./LICENSE)
6+
[![Codecov](https://img.shields.io/codecov/c/github/datatogether/identity.svg?style=flat-square)](https://codecov.io/gh/datatogether/identity)
57

6-
The big difference from other crawlers is a tunable “stale duration”, which will tell the crawler to capture an updated snapshot of the page if the time since the last GET request is older than the stale duration. This gives it a continual “watching” property.
8+
[1-3 sentence description of repository contents]
79

8-
Sentry holds a separate stream of scraping for any url that looks like a dataset. So when it encounters urls that look like `https://foo.com/file.csv`, it assumes that file ending may be a static asset, and places that url on a separate thread for archiving.
10+
## License & Copyright
911

10-
# Related Projects
12+
[Modelled on [project guidelines template](https://github.com/datatogether/roadmap/blob/master/PROJECT.md#license--copyright-readme-block) ]
1113

12-
In parallel to building this tool, we have engaged in efforts to map the landscape of similar projects:
14+
## Getting Involved
1315

14-
:eyes: See: [**Comparison of web archiving software**](https://github.com/datatogether/research/tree/master/web_archiving)
16+
We would love involvement from more people! If you notice any errors or would like to submit changes, please see our [Contributing Guidelines](./.github/CONTRIBUTING.md).
17+
18+
We use GitHub issues for [tracking bugs and feature requests](https://github.com/datatogether/REPONAME/issues) and Pull Requests (PRs) for [submitting changes](https://github.com/datatogether/REPONAME/pulls)
19+
20+
## ...
21+
22+
## [Optional section(s) on Installation (actually using the service!), Architecture, Dependencies, and Other Considerations]
23+
24+
[fill out this section if the repo contains deployable/installable code]
25+
26+
## Development
27+
28+
[Step-by-step instructions about how to set up a local dev environment and any dependencies]
29+
30+
## Deployment
31+
32+
[Optional section with deployment instructions]

0 commit comments

Comments
 (0)