-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for multiple wikis running on one container/pod #57
Comments
This idea is very good, I intend to create at least 3 Wikis preferably with the same MediaWiki base, it would be great to keep everything in a Container or an installation only. From what I saw it would be a Wiki Farm or Wiki Family, I found this content about https://www.mediawiki.org/wiki/Manual:Wiki_family and the Wiki Farm category https://www.mediawiki.org/wiki/Category:Wiki_farm , which has some interesting extensions about creating Wikis and Managing. |
Yes |
Is there any significant benefits of having a farm running inside a single Canasta container vs having a compose stack running a single database instance + multiple Canasta containers, one per wiki? |
In my opinion, the primary benefit is performance improvement. When MediaWiki runs multiple wiki instances on the same codebase, it allows PHP to cache a lot of the same opcode. That not only makes running MediaWiki more resource-efficient, it also significantly improves scalability. It seems difficult, but actually it is not too bad from the MediaWiki side. It's what the Wikimedia Foundation does to run all of their several hundreds of wikis. Each of their application servers are capable of serving traffic to render any of their wikis. |
@jeffw16 - can you clarify what you mean by "performance improvement"? How much faster would it be to run, say, 20 wikis in one Canasta container, vs. having 20 containers with one wiki each? |
Per Yaron's request, let me give a quick example about scalability concerns: Say we are running 20 wikis, each with 2 GB of memory. That requires 40 GB of memory to run. We are running 20 separate instances of Apache and MediaWiki. There is no reason to duplicate Apache 20 times and MediaWiki 20 times. The Wikimedia Foundation actually runs just one instance of Apache and MediaWiki per server. In order to serve different wikis on the same MediaWiki instance, it uses an if statement in |
I don't think we can gain anything by running multiple wikis using just one Canasta container.
and I don't think it works as you described.
Also I think the opcode cache is small and hardly affects anything compared to other things. Apparently we have different theoretical ideas about how it works. need some objective data. |
I'm speaking from the experience we have at @mywikis running our servers and the architectural changes we've made in order to go from a slow platform to a fast scalable one. We currently can run hundreds of wikis on the same application server (if not more). Please elucidate what you mean by "I don't think it works as you described" - if we have 20 containers and each uses 2 GB of RAM, why is 20 x 2 not equal to 40? |
I disagree, if you try to run 20 wikis on the same Apache server but on different MediaWiki instances (i.e. different directories), PHP will not recognize them as the same opcode and it will be slower. Using multiple wikis at the same time will overwhelm the Apache server especially if it is using mpm prefork and not mpm event. I can't imagine how interpreting vs. compiling have negligible performance differences. Clearly precompiling should make a sizeable difference. |
if you reserve 2 GB if RAM for each of 20 containers, then 20x2 is equal to 40. But as far as I know docker container is not a virtual machine, it is a prepared environment for the process. When 20 wikis running on 20 containers they need at least 20 separated processes. The processes uses allowed (in configuration) RAM. I agree that we pay for separate processes with separate environments, but we have no metrics how much we pay. I think "the opcode cache is small and hardly affects anything compared to other things". I think the file cache is same because it uses same files for all the containers. The file cache is very important, and when you have different MediaWiki instances (directories) you need much more RAM for the file cache (20 times more) and if RAM is limited (it is) it works slowly because it should read the files from hard drive.
Do you have any metrics? I think it is very important for static files, and it can work two times faster with static files. But I think there is no so big difference with PHP files, probably we can win about 5% only.
It is big difference, but for example compared to small the file cache it is 0.01% of common performance. We can discuss about performance on huge loaded servers when we have metrics from the servers. One small mistake in configuration can critically degrade performance. My experience says that it never work as we can imagine in theory. Somewhere I'm right, somewhere I'm wrong. Only metrics can answer this. |
Although I don't have test cases and hard metrics, my experience running wikis supports @jeffw16 statements. And the opcode cache for PHP is definitely a big factor in PHP performance. So, I'd say the best architecture is to have the farm inside a single container. |
I think it is obvious @freephile and @jeffw16 are correct here, but I admit assertions lack the weight of actual metrics. If a comparison were done between 5 shared wikis on a single canasta instance vs 5 separate canasta instances, would that satisfy your request for metrics, @pastakhov ? |
If someone wants to provide metrics, that's fine, but let's maybe use our time for something more productive; i.e. implementing this. |
Well, before metrics can be supplied there has to be some sort of implementation of this (even if it is a fragile one for proof of concept). |
That's the thing: If a POC is close enough, then I'll speak from the empirical experience we have at MyWikis. We have a similar setup to what I've proposed in this thread, and we've found that the PHP opcache is very important in speeding things up. At MyWikis we are not going to worry about metrics at the lowest level - our priority is ensuring load times remain low. They have. |
At my old workplace we ran about 10 wikis with different codebase, even though we were planning to integrate the same db and static resources for all wikis. I think the issue would be when having the same codebase the locks it might set for when they're accessing the same resource. |
@gitmapd Thanks for your concern and that brings up a good point. I should clarify: in the proposed setup, we would still be giving each wiki its own database. Using the same codebase does not preclude the use of different databases. We'll use the hostname of each HTTP request to switch between different LocalSettings.php files. |
I think that @jeffw16 suggestions regarding the performance gains for running a common farm setup (single codebase + multiple databases) does make sense and at the same time @pastakhov notes on the nature of docker processes and resources sharing are also legit So this actually feels like one should pick the best that fits his goal - for example: if I'd need to have a horizontal scalability I think I'd pick the farm approach to be able to easily grow number of containers copies for serving requests, otherwise if there'd be a need to quickly spin up custom and variable wiki copies - I'd pick having one wiki per container, maybe even each having own database running sacrificing resources for sake of higher isolation But my point is not solely about performance, instead I have some doubts on how reasonable it'd be to integrate this into Canasta image/stack, it feels like introducing a farm-mode will require quite big rework of the way stack/containers are configured and how init processes are performed, aside of that the farm mode does not seems to be in line with the Canasta image of being a simple to setup user-friendly thing Either way, I'd definitely be curious to see implementation of a farm as a PoC since such setup would have own pros like easier horizontal scalability |
Thanks @vedmaka for chiming in and making some very excellent points!
Completely agreed. There should not be any reason someone can't do entirely separate Canasta instances.
I think in principle it should be fine. Here's my thoughts on how it affects each component:
Becoming too complicated is definitely a legitimate concern. I certainly favor some way of keeping balance so that Canasta remains easy to use for single-wiki use cases. |
@jeffw16 |
Do you know how to report how this feature is progressing?, the ideas are excellent even more if traefik is used instead of candy, there is even an issue about. |
@FelipoAntonoff - this project was just accepted to the Google Summer of Code last week, to be done by @chl178 : https://phabricator.wikimedia.org/T333773 So, the plan is for this to get done this summer. I don't know if there will be one place where one can see the progress, but hopefully by September or so there will be support for multiple wikis/wiki farms. |
Thanks @yaronkoren for the information, this project will be very interesting, I'll keep an eye out. It should help a lot in the adoption of Mediawiki, because many use cases I imagine that the ideal is to have N Wikis to separate private, public or N projects, Mediawiki itself should focus more on this natively in a more automated way, but Canasta will already help who wants to automate this process using Container :) |
It's well overdue, but I'm closing this issue now! |
Allow for Canasta to have the option of running multiple wikis using just one Canasta container.
The text was updated successfully, but these errors were encountered: