-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large scale considerations #173
Comments
Thanks for bringing these up! A couple thoughts: python/fastapi perf: distributed filesystem: database: |
BTW we have done some load testing using locust and we can process around 100 rps (requests per second) on a standard laptop using single quetz worker (for the download endpoint which generate a redirect to S3 file). |
@btel Aw, yeah, locust is wonderful! (disclaimer: maintains conda-forge feedstock 👿). For giggles, can you toss the stats summary output? While pretty, i find the charts lie, as small error counts, etc. can still look flat. It would be lovely to have this under test... not for absolute numbers, but to catch significant regressions (e.g. starts throwing lots of 500s). Basically, CI caches the repo's To that point, having this for every route is important, especially with a couple admins and a horde of users changing lots of stuff (especially permissions!) at a furious rate, as it can reveal nasty things like full-table database locks which don't get caught when tested in isolation. Another tool in the shed, to both improve the baseline, and help debug perf regressions, is the opencensus stack, with a simple example here. It looks like there is some work going on to give finer-grained insights of the fastapi side, while the sqlalchemy integration is already very robust. I've used the jaeger integration (also on conda-forge, might need some maintainer ❤️) for reporting. I've yet to do a FULL full stack integration with opencensus-web, but this is the real cadillac, as you can trace button gets pushed in SPA to pixels on page for a single request, which is a thing of beauty when it works properly. Having all these hooks built in to the various tiers, ready to be turned on by a site admin, can help them really own an application, beyond simple log mining, and can yield much better issue reports. Trying to get this level of insight from a "hostile" application is... harder. |
hi @bollwyvl, thanks for the valuable suggestions. Automatizing load testing is definitely on our roadmap. I haven't every used the openncensus stack, it's definitely something I would like to investigate. Thanks again for the pointers! |
i forgot about the locust stats, I need to re-generate them, because stupidly I did not conserve them. btw we benchmarked the download endpoint, because it's the one that's going to be most frequently hit by users (and CIs), but I agree we should test other endpoints as well. Bartosz |
minor update: we got the jaeger-feedstock updated to the most recent version (go has been rethinking their packaging approach, har). |
another update: go-ipfs-feedstock should exist soon (not up yet, but GH having a bad day, I guess). @wolfv @yuvipanda and i have been semi-seriously kicking around ideas on federated stuff for a while, so i guess it's a little more real (to me) now! |
What could speed up quetz by a lot is a smart caching system. Most quetz content are static files that don't get updated very often and there is no need to make a db or even a fastapi request for content that already got served and hasn't changed since. For an in memory cache fastapi-cache is the obvious choice. But the easiest strategy to implement is probably generating proper ETag in fastapi and then put a really large NGINX Content Cache on top. NGINX is really fast with serving static content and if most of the requests get cached the raw fastapi performance is less of an issue. The other things that might be worth a look are Backblaze for storage and Cloudflare as CDN. |
Personally I wouldn't dare to put a production system right on top of IPFS. But IPFS could the perfect solution for a long term package archive and /or package distribution system. |
I spent time trying to run IPFS for one of my side projects, but switched back to an S3 API instead. You still need to run pinning nodes, and the well tested setups pin content to filesystems. So you end up needing to run a cluster that requires you run file systems, which can get messy. Latency was also highly variable. I think it's getting better, but it's not useful in medium - large scales right now. |
I would like to open that issue to list what points are important to keep in mind in the development of
Quetz
in the perspective of a large scale use.What I have in mind:
Language or dependencies
FastAPI
could handleDatabase/storage
PGSQL
, projections of volumetry and ops/s to be able to handleOthers
This is just a draft to be updated with contributions (concerns, solutions, links to pr, etc.)!
The text was updated successfully, but these errors were encountered: