Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit log size #162

Open
mlandauer opened this issue Apr 23, 2020 · 4 comments
Open

Limit log size #162

mlandauer opened this issue Apr 23, 2020 · 4 comments

Comments

@mlandauer
Copy link
Member

Because logs are kept in redis there is a definite limit to the size that they should sensibly be. This is not a bad thing anyway. We have similar limits in place on morph.io.

@mlandauer
Copy link
Member Author

Doing a quick little back-of-the-envelope calculation: Our current production redis (elasticache) instance has about 500MB of memory. If we want to easily support up to 1000 concurrent scrapers we can't allow each scraper to use more than 500KB of data on redis. Say 250KB of that is dedicated to the streaming logs and then we have heaps of head-room.

250KB corresponds roughly to 2000 lines of 128 characters each. That's a pretty respectable number.

@mlandauer
Copy link
Member Author

morph.io allows up to 10,000 log lines. If we assume 128 characters on each that gives us a total memory usage for the logs of about 1.2MB. With that memory usage we could probably support up to about 400 simultaneous scrapers.

It is of course easy enough to just get a redis instance with more memory but it's nice to know that we can support quite a lot of simultaneous users without going completely overboard with the size of the redis nodes.

@mlandauer
Copy link
Member Author

Maybe we should pick a total size limit on the logs of 1MB. That's a nice round number and matches roughly with the current restriction in place on morph.io

@mlandauer
Copy link
Member Author

Coming back to this issue after a long time away it seems to me that the fundamental bottleneck that is created by using redis (which stores everything in memory) is a good argument against using redis. Perhaps postgres with its support for streaming is a more sensible choice here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant