-
Notifications
You must be signed in to change notification settings - Fork 11.4k
[9.x] Adds support for Parallel Testing #35778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[9.x] Adds support for Parallel Testing #35778
Conversation
@nunomaduro This looks good. How would in-memory SQLite databases be handled? Would there be an in-memory database per process, or a single in-memory database for the test run as a whole? |
@martinbean When using SQLite, it's an in-memory database per process. |
Love to see work in the area, great idea!
…
These statements are slightly contradicting, but the detailed explanation makes it clear it's "per test class". Please allow me to share some findings into this topic I had recently. I found out out that this will not scale well when using real databases, at least I can share this with Postgres 13. I know this because I attempted this already in a big project with lots-of-integration tests and conducted this recently, i.e. not even a month ago. I tinkered and tuned my approach with paratest and found out to get the optimum of performance:
This means:
Anything which, with real databases, creates / drops schema "per test case" or "per test class" will not yield the expected performance improvements because the setup costs are generally too high. I tested this with:
None of this worked properly, only when used as an initialization step for "phpunit paratest worker" (in which case which approach did not matter much, because usually the "processes to tests" ratio is insignificant for that work). The transactions are required for proper isolation but still can have side effects (e.g. postgres does not roll back sequences like MySQL does its auto-increment). Note: will likely no impact sqlite as much (did not test) and, well for non-db tests, this is a non-issue anyway. Hail them :) Than you, much appreciated 🙏 |
I too was confused by the wording regarding creating a database per test class. Why would this be necessary? Wouldn't you only need a database per process? |
@mfn Thanks for your feedback. At this time, this pull request only acts on test suites that are using the "RefreshDatabase" trait. Why the "DatabaseTransactions" trait would equally need different databases? The changes never get committed, so technically there is no issue with sharing the same database between processes right? @taylorotwell I've updated the issue description about your concern. Indeed, at this time, a temporary database is created per each test case class. And, the database name will contain a unique id - associated with the process consuming that test case class - to ensure the database name is unique at that specific time. Note that there is no performance issue on this, as Yet, if what @mfn says is true, and the PostgreSQL driver requires different databases while using the "DatabaseTransactions" trait, that may justify indeed creating a database per process, instead of creating a database on each test case class. |
Correct me if I'm wrong @nunomaduro, but we only need to recreate the database for each test because the state of the So, besides that, I think we would only need to clean up the databases after the test run (assuming we won't use the |
Basically I, painfully, had to realize it simply won't work "stable" due to dead locks which eventually arise and are hard to debug. Having a dedicated database side tracks the problem. I did try the "single database" approach before that, but that was years ago and with that knowledge, I never attempted it again but got it working in a stable way with the "database per worker approach" so I pursued that approach further with success. According to this experience, everything which truly concurrently accesses the database, even with transactions, will eventually start to show flaky tests due to this, so I did not bother to investigate this again. This may be specific to the way what the application / tests really do within the database so it might be specific to a certain project; at this point all I can say is that separating the databases is the most compatible way but unfortunately adds complexity to manage this. But basically, if I understood you correctly, it's the reason why SQLite will work because it implicitly works that way due to being in-memory per-process. |
Yeah, I would definitely recommend always having a separate database per test process. Also, it should be regardless of what database refresh trait is used. |
@nunomaduro @taylorotwell Nice work. Another concept you may wish to investigate to improve test-speed which can run along-side parallel testing is to re-use databases between test-runs. The code-distortion/adapt package does this and might give you some ideas. It builds your database/s only on the first run, skipping the need to do so again until it detects changes to your migrations, seeders etc. Subsequent test-runs start straight away. (disclosure: I wrote this package). It supports other techniques like taking snapshots that are imported instead of running the migrations and seeders, and has extra features like support for Dusk, allowing for multiple databases (eg. if your project uses multiple dbs at once) and running different seeders for different tests. Apart from parallel testing, the the biggest speed benefit would be the database re-use. |
@mfn Thank you for sharing your feedback about the usage of the As @taylorotwell mentioned, we are going to adapt the implementation to create a database per test process - that should be used during the lifecycle of the process - regardless of the used database testing trait. |
@nunomaduro great job on the PR. I’ve been using paratest for a few years now and it’s greatly improved the speed for tests in a big app. I wanted to suggest that you look into using a prefix for the tables instead of a different database for each test. Not sure why that may or may not be better, but I just wanted to add it to the conversation. |
IMHO this makes things much more complex, as table names are literally defined in the migration files and have to "manipulated" for each? Also this can cause troubles with models and their expectancy that the table is the same name as the model, it would either a) barf immediately or b) create failed tests with custom queries or queries using FQN. I would there for suggest not to do this: a separate database is a much better isolation. |
Apply fixes from StyleCI
…aduro/framework into feat/adds-parallel-testing
This is looking really solid now. Amazing work! 🚀 |
Apply fixes from StyleCI
I'm not sure if my project will work with Laravel 9 already, but I can offer to test this PR with Postgres 13; just let me know when its ready for it. |
@mfn It's ready. You run a test, thanks! |
…aduro/framework into feat/adds-parallel-testing
Apply fixes from StyleCI
It should be pointed out that for this to work you must not have moved / replaced the trait Should this be configurable? |
Parallel Testing with processes, works better for me. ✌️ |
This pull request adds support for Parallel Testing in Laravel 9.
Docs: https://github.com/laravel/docs/pull/6737/files.
How you can help
You can help testing this pull request by running tests in Parallel with your test database driver:
Overview
As you may know, Laravel 5.7 introduced the
php artisan test
command that you may use to test your applications. In this pull request, we introduce the--parallel
option that you may use to run your tests in parallel:Of course, by default, your application may not have all the dependencies needed by this feature - such us
brianium/paratest
- so we will confirm with the users if we can install the dependencies for them:Finally, the results are pretty impressive. As an example, running the Laravel.io test suite, we went from 90 seconds down to 19 seconds. Almost 5x faster in my 6-Core Intel Core i7. 🚀🚀🚀
In addition, this pull request introduces a Parallel Testing API that is used internally by the pull request, but also can be used by applications to ensure resources are unique per process or per test case.
Implemention
The implementation of this pull request consists of modifying the framework's code so tests can be run in Parallel. Behind the scenes, the
php artisan test --parallel
command, will runLARAVEL_PARALLEL_TESTING=1 ./vendor/bin/paratest
andLARAVEL_PARALLEL_TESTING=1
will instruct the framework that tests are being run in parallel.Pull request that introduces the
--parallel
in thephp artisan test
command: nunomaduro/collision/pull/168.Of course, by default, the env
LARAVEL_PARALLEL_TESTING
does not exist, so the code introduced in this pull request won't change a thing for people who don't want to use parallel testing.Now, because PHP lacks native threads, the typical way of achieving some level of concurrency is using processes - like Laravel Queues. Therefore, we are using Parallel Testing with processes. By default, we will use the number of CPU cores as the number of processes. Of course, people can tinker with this value like so:
php artisan test --parallel --processes=8
Keep in mind, when we talk about parallel testing with processes, is just about running
./vendor/bin/phpunit
on each test case, at the same time, and aggregate the results at the end. And this is what thebrianium/paratest
dependency does for us.To ensure database tests won't collide , each process works with its own test database, regardless of what database refresh trait is used. Test databases are created on the first time the process uses a database testing trait.
Also, test databases persist between
php artisan test --parallel
executions. Of course, you can always refresh - drop and re-create - those databases using the--recreate-databases
option:php artisan test --parallel --recreate-databases
In addition, temporary databases are suffixed with the number corresponding to the process. For example, if you have 2 processes, each process will create
your_database_test_1
andyour_database_test_2
respectively. Keep in mind, you still need an active database connection to theyour_database
, and a user that have permissions to create a databases.Of course, the same goes for the Storage::fake, while using
Storage::fake('local')
, we create multiple local storages like so:local_test_1
andlocal_test_2
.