Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple wikis running on one container/pod #57

Closed
jeffw16 opened this issue Mar 29, 2022 · 24 comments
Closed

Support for multiple wikis running on one container/pod #57

jeffw16 opened this issue Mar 29, 2022 · 24 comments
Labels
enhancement New feature or request

Comments

@jeffw16
Copy link
Member

jeffw16 commented Mar 29, 2022

Allow for Canasta to have the option of running multiple wikis using just one Canasta container.

@jeffw16 jeffw16 added the enhancement New feature or request label Mar 29, 2022
@FelipoAntonoff
Copy link

This idea is very good, I intend to create at least 3 Wikis preferably with the same MediaWiki base, it would be great to keep everything in a Container or an installation only.

From what I saw it would be a Wiki Farm or Wiki Family, I found this content about https://www.mediawiki.org/wiki/Manual:Wiki_family and the Wiki Farm category https://www.mediawiki.org/wiki/Category:Wiki_farm , which has some interesting extensions about creating Wikis and Managing.

@jeffw16
Copy link
Member Author

jeffw16 commented May 30, 2022

Yes

@vedmaka
Copy link
Collaborator

vedmaka commented Jul 22, 2022

Is there any significant benefits of having a farm running inside a single Canasta container vs having a compose stack running a single database instance + multiple Canasta containers, one per wiki?

@jeffw16
Copy link
Member Author

jeffw16 commented Jul 22, 2022

In my opinion, the primary benefit is performance improvement. When MediaWiki runs multiple wiki instances on the same codebase, it allows PHP to cache a lot of the same opcode. That not only makes running MediaWiki more resource-efficient, it also significantly improves scalability.

It seems difficult, but actually it is not too bad from the MediaWiki side. It's what the Wikimedia Foundation does to run all of their several hundreds of wikis. Each of their application servers are capable of serving traffic to render any of their wikis.

@yaronkoren
Copy link
Member

@jeffw16 - can you clarify what you mean by "performance improvement"? How much faster would it be to run, say, 20 wikis in one Canasta container, vs. having 20 containers with one wiki each?

@jeffw16
Copy link
Member Author

jeffw16 commented Aug 31, 2022

Per Yaron's request, let me give a quick example about scalability concerns:

Say we are running 20 wikis, each with 2 GB of memory. That requires 40 GB of memory to run. We are running 20 separate instances of Apache and MediaWiki. There is no reason to duplicate Apache 20 times and MediaWiki 20 times. The Wikimedia Foundation actually runs just one instance of Apache and MediaWiki per server. In order to serve different wikis on the same MediaWiki instance, it uses an if statement in LocalSettings.php to detect which wiki is being requested by the user/maintenance script (via the Host HTTP header or the MW_ID const, respectively). And yes, 20 wikis can be easily run on 2 GB without crashing.

@pastakhov
Copy link
Collaborator

pastakhov commented Aug 31, 2022

I don't think we can gain anything by running multiple wikis using just one Canasta container.

Say we are running 20 wikis, each with 2 GB of memory. That requires 40 GB of memory to run

and I don't think it works as you described.

it allows PHP to cache a lot of the same opcode

Also I think the opcode cache is small and hardly affects anything compared to other things.

Apparently we have different theoretical ideas about how it works. need some objective data.

@jeffw16
Copy link
Member Author

jeffw16 commented Aug 31, 2022

I'm speaking from the experience we have at @mywikis running our servers and the architectural changes we've made in order to go from a slow platform to a fast scalable one. We currently can run hundreds of wikis on the same application server (if not more).

Please elucidate what you mean by "I don't think it works as you described" - if we have 20 containers and each uses 2 GB of RAM, why is 20 x 2 not equal to 40?

@jeffw16
Copy link
Member Author

jeffw16 commented Aug 31, 2022

Also I think the opcode cache is small and hardly affects anything compared to other things.

I disagree, if you try to run 20 wikis on the same Apache server but on different MediaWiki instances (i.e. different directories), PHP will not recognize them as the same opcode and it will be slower. Using multiple wikis at the same time will overwhelm the Apache server especially if it is using mpm prefork and not mpm event. I can't imagine how interpreting vs. compiling have negligible performance differences. Clearly precompiling should make a sizeable difference.

@pastakhov
Copy link
Collaborator

if we have 20 containers and each uses 2 GB of RAM, why is 20 x 2 not equal to 40?

if you reserve 2 GB if RAM for each of 20 containers, then 20x2 is equal to 40. But as far as I know docker container is not a virtual machine, it is a prepared environment for the process. When 20 wikis running on 20 containers they need at least 20 separated processes. The processes uses allowed (in configuration) RAM. I agree that we pay for separate processes with separate environments, but we have no metrics how much we pay. I think "the opcode cache is small and hardly affects anything compared to other things". I think the file cache is same because it uses same files for all the containers. The file cache is very important, and when you have different MediaWiki instances (directories) you need much more RAM for the file cache (20 times more) and if RAM is limited (it is) it works slowly because it should read the files from hard drive.

Using multiple wikis at the same time will overwhelm the Apache server especially if it is using mpm prefork and not mpm event.

Do you have any metrics? I think it is very important for static files, and it can work two times faster with static files. But I think there is no so big difference with PHP files, probably we can win about 5% only.

I can't imagine how interpreting vs. compiling have negligible performance differences

It is big difference, but for example compared to small the file cache it is 0.01% of common performance.

We can discuss about performance on huge loaded servers when we have metrics from the servers. One small mistake in configuration can critically degrade performance. My experience says that it never work as we can imagine in theory. Somewhere I'm right, somewhere I'm wrong. Only metrics can answer this.

@freephile
Copy link
Contributor

Although I don't have test cases and hard metrics, my experience running wikis supports @jeffw16 statements. And the opcode cache for PHP is definitely a big factor in PHP performance. So, I'd say the best architecture is to have the farm inside a single container.

@hexmode
Copy link
Contributor

hexmode commented Sep 29, 2022

I think it is obvious @freephile and @jeffw16 are correct here, but I admit assertions lack the weight of actual metrics.

If a comparison were done between 5 shared wikis on a single canasta instance vs 5 separate canasta instances, would that satisfy your request for metrics, @pastakhov ?

@jeffw16
Copy link
Member Author

jeffw16 commented Sep 30, 2022

If someone wants to provide metrics, that's fine, but let's maybe use our time for something more productive; i.e. implementing this.

@hexmode
Copy link
Contributor

hexmode commented Oct 1, 2022

If someone wants to provide metrics, that's fine, but let's maybe use our time for something more productive; i.e. implementing this.

Well, before metrics can be supplied there has to be some sort of implementation of this (even if it is a fragile one for proof of concept).

@jeffw16
Copy link
Member Author

jeffw16 commented Oct 1, 2022

That's the thing: If a POC is close enough, then I'll speak from the empirical experience we have at MyWikis. We have a similar setup to what I've proposed in this thread, and we've found that the PHP opcache is very important in speeding things up. At MyWikis we are not going to worry about metrics at the lowest level - our priority is ensuring load times remain low. They have.

@gitmapd
Copy link

gitmapd commented Oct 16, 2022

At my old workplace we ran about 10 wikis with different codebase, even though we were planning to integrate the same db and static resources for all wikis. I think the issue would be when having the same codebase the locks it might set for when they're accessing the same resource.

@jeffw16
Copy link
Member Author

jeffw16 commented Oct 16, 2022

@gitmapd Thanks for your concern and that brings up a good point. I should clarify: in the proposed setup, we would still be giving each wiki its own database. Using the same codebase does not preclude the use of different databases. We'll use the hostname of each HTTP request to switch between different LocalSettings.php files.

@vedmaka
Copy link
Collaborator

vedmaka commented Oct 23, 2022

I think that @jeffw16 suggestions regarding the performance gains for running a common farm setup (single codebase + multiple databases) does make sense and at the same time @pastakhov notes on the nature of docker processes and resources sharing are also legit

So this actually feels like one should pick the best that fits his goal - for example: if I'd need to have a horizontal scalability I think I'd pick the farm approach to be able to easily grow number of containers copies for serving requests, otherwise if there'd be a need to quickly spin up custom and variable wiki copies - I'd pick having one wiki per container, maybe even each having own database running sacrificing resources for sake of higher isolation

But my point is not solely about performance, instead I have some doubts on how reasonable it'd be to integrate this into Canasta image/stack, it feels like introducing a farm-mode will require quite big rework of the way stack/containers are configured and how init processes are performed, aside of that the farm mode does not seems to be in line with the Canasta image of being a simple to setup user-friendly thing

Either way, I'd definitely be curious to see implementation of a farm as a PoC since such setup would have own pros like easier horizontal scalability

@jeffw16
Copy link
Member Author

jeffw16 commented Oct 24, 2022

Thanks @vedmaka for chiming in and making some very excellent points!

So this actually feels like one should pick the best that fits his goal

Completely agreed. There should not be any reason someone can't do entirely separate Canasta instances.

I have some doubts on how reasonable it'd be to integrate this into Canasta image/stack, it feels like introducing a farm-mode will require quite big rework of the way stack/containers are configured

I think in principle it should be fine. Here's my thoughts on how it affects each component:

  • The major thing would be reworking how config files are stored. Now, we'd need to have multiple config folders, one for each wiki.
  • Our existing database container setup should be fine; no changes needed here.
  • Supporting multiple domains would require some reworking of how we configure Caddy and Traefik, but it could be argued that multi-domain support could also be useful to even one wiki.
  • The init script and maintenance scripts would need to be ran for each wiki, and the wiki ID parameter would need to be passed into each invocation (i.e. the --wiki flag when invoking maintenance scripts). But this should just be a matter of doing a for each operation over each of the available wikis in the Canasta instance. I could be missing something here though, but I don't see much else standing in the way.

farm mode does not seems to be in line with the Canasta image of being a simple to setup user-friendly thing

Becoming too complicated is definitely a legitimate concern. I certainly favor some way of keeping balance so that Canasta remains easy to use for single-wiki use cases.

@nik-55
Copy link

nik-55 commented Mar 5, 2023

@jeffw16
Hello, myself Nikhil Mahajan. I am interested to work on this issue as a part of GSoC. Where should I discuss the project and setup-related stuff? I have already contributed to Wikimedia where 5 patches have been merged and all of them are related to Wikimedia extensions.

@FelipoAntonoff
Copy link

Do you know how to report how this feature is progressing?, the ideas are excellent even more if traefik is used instead of candy, there is even an issue about.
I ask because I don't know if I'll wait or run each wiki in a container for the time being.

@yaronkoren
Copy link
Member

@FelipoAntonoff - this project was just accepted to the Google Summer of Code last week, to be done by @chl178 : https://phabricator.wikimedia.org/T333773

So, the plan is for this to get done this summer. I don't know if there will be one place where one can see the progress, but hopefully by September or so there will be support for multiple wikis/wiki farms.

@FelipoAntonoff
Copy link

Thanks @yaronkoren for the information, this project will be very interesting, I'll keep an eye out. It should help a lot in the adoption of Mediawiki, because many use cases I imagine that the ideal is to have N Wikis to separate private, public or N projects, Mediawiki itself should focus more on this natively in a more automated way, but Canasta will already help who wants to automate this process using Container :)

@yaronkoren
Copy link
Member

It's well overdue, but I'm closing this issue now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants