-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QGIS Quality Assurance methodology and infrastructure #180
Comments
Great proposal. From all the grant proposals submitted this year (while many of them are excellent), in my personal opinion, this is the most important one. I think we should invest in the future more funds towards a better testing infrastructure and also fund the follow-up work necessary to actually do the testing. |
I'm curious if you could link to some recent bug reports you think would have been caught by this setup. I know in the past (years ago) we've had releases with really bad showstopper bugs on initial release, but I honestly can't think of any in recent years. All the bugs we get now are really quite involved, which makes me wonder if this setup would need thousands and thousands of user tests to actually have caught anything...? |
@nyalldawson yes you are completely right, thanks to the tests we already have (and that grow in number every day) we have avoided some major disasters as it happened from time to time in the past and QGIS is strong as it was never before. This also targets deep manual testing of new functionalities, but also some basic core functionality that should never fail/regress. A few recent examples (that seem that have not being caught by automated tests?): qgis/QGIS#36689 Of course anyone will always be able to argue that some functionality "is not important", but the anyway the overall goal here to to not leave anything uncovered, one way or the other. |
Thanks @gioman! Just thinking aloud here, please correct me if I'm wrong anywhere or have misinterpreted the proposal: Looking at these two, they are better candidates for unit tests as opposed to user run tests. Specifically, qgis/QGIS#36689 relates to a crash when a certain type of raster dataset is loaded -- this should be caught by unit tests instead. It's unlikely that a user test would help here, as the bug was only found when this data type was loaded following the introduction of the new provider. For a user test to have picked this up you'd be relying on the user test suite including a sample of this data type (and as soon as you added this user test, you'd pick up the bug immediately!) Similarly, I find it extremely unlike that qgis/QGIS#35671 would be helped by a user run test. To trigger this you'd need to use the tool on a layer without z values present, saving to a provider which is strict about the presence/absence of z values. So, in order to catch this, we'd have required:
Let's be conservative and drop this to a best case scenario of 60 tests before the regression is flagged. I.e. to ensure that we'd have caught this particular bug we'd have to run at least 60 tests in different combinations for every release. That's an extreme amount of volunteer-power! Alternatively, a unit test would be more suitable to cover this particular case -- the one-time investment in writing the test means that there's no longer any chance of it slipping through and no volunteer time required. This is indeed a good candidate for user testing (i.e. performance regressions), thanks for pointing this one out. That said, I'm still skeptical that we have the capacity to flag regressions like this via user-run tests. In order to find this one you'd have to have a test which requires the user to load a huge table, and then open the attribute table and trigger the interactions in a certain order. And then they'd have to time this and know what the expected length of time for the task to complete is (if not, they'd likely just think the slowness was expected). Off the top of my head, I'd estimate that running through a set of user tests covering the attribute table/form functionality in order to catch something like this would take at least 2 hours (please let me know if you disagree here!). That's (at least) 2 hours for one component of QGIS for every release we do. It's not a big jump to estimate that a set of user tests giving decent coverage of the fundamentals of QGIS would require in the order of 100-200 hours work per release. I just don't see us having the volunteer power to make this feasible. As much as I love the idea in principle, I think in reality we're better off spending the effort writing regression tests which are run automatically by the CI and focusing our efforts there... |
The examples provided by @gioman might not the best examples, but I can assure you @nyalldawson that there are still numerous issues that aren't covered by unit tests. I remember f.e. quite a few issues in the attribute tables and forms that can only be detected through user testing (like putting selected features on the top, relation reference issues, etc.) @nyalldawson - if you insist I could several other issues I/we reported in the past that hadn't been covered by unit tests - and it would be hard to get them covered through tests. Mainly in the areas of editing, node editing, snapping, forms and attribute table. But I totally agree that efforts should be made to improve unit test coverage, where possible. And QGIS did improve a lot due to the increased test coverage. As far as I know both @gioman and @SrNetoChan had been involved with user testing at Boundless and probably have quite some experience in this area. I agree with @nyalldawson that the user testers should have an open eye when they discover issues to assess whether the case they just discovered could be secured by a unit test. |
@andreasneumann no, in fact were the first 3 that came to my mind without any search ;) |
Oh, I totally agree with that!
Right -- but my concern is that in order for these to be tested, someone would have to first create a user run test for them and then rely on users to run this test for every release. If there's no user-run test covering the particular set of circumstances required to trigger the issue then obviously it still won't get caught (just like if there's no unit test covering it). And I'm concerned that in order for this set of user run tests to be meaningful, they'd have to be absolute mammoth. A large number of developers + users DO run nightly releases as their main releases, so we do quickly pick up regressions in basic QGIS functionality (such as if a menu option stops doing anything). Accordingly these tests would need to cover all the uncommon user operations to be valuable -- and to cover all these uncommon operations is such a ridiculously huge task that I question whether there's going to be any real-world benefit in the end. If, after this is done, we end up with say a set of 200 tests covering things like:
That's why I estimated we'd need tests which take 100-200 hours per release in order for these extra user run tests to have any real benefit in the end. And that's a HUGE time commitment! Don't get me wrong: I'm all for greater testing and stability. But I just don't see how this approach can be effective for a project like QGIS. Sure, if we had 200 hours worth of tests and paid staff to run them through every release, then there'd be no harm (and potentially a lot of benefit). But if we split that effort and spent a fraction of that time writing extra unit tests + documentation, we'd get a lot better value for the effort... |
Having seen both sides (as an ex Boundless team member and as a unit test writer) I see the pros and cons of both approaches. I totally share @nyalldawson concerns about the amount of work required to make and run these semi-automatic tests cycles, but there are a few things that needs to be considered:
That said, I also have concerns about the sustainability of such a big effort in the long run, I see a risk of lack of resources to maintain the cycles and to run them. IMO before we embark in this we should carefully asses these potential issues. |
Perhaps we need to define the areas where user testing makes most sense. The whole editing section (node tool, construction tools, splitting, merging, etc.) is certainly an area where this would make sense. In our experience we also have lot of issues with forms (still) and how the forms interact with constraints and PostgreSQL transaction mode. Frustrating things where the user edits some features and at the end he is greeted with a message that the features can't be saved to the DB, because somewhere a constraint is not met. Or you copy/paste a geometry a feature from a different layer and you are immediately greeted with a message that the constraints are violated, before you have a chance to edit the attribute in the form. If we restrict the user testing to certain areas as a start, I think we would add a lot of benefit but don't spend an awful lot of resources. |
@nyalldawson I can see your concern about this being a potential "bottomless pit" (just like the bug fixing where >50% of our funds are spent currently), but I think it is worth a try. And with the former boundless employees we have people with quite some experience and background in this area. I think we should give this a try and then after some time evaluate - what went into this, and what came out of it. |
Hi @nyalldawson, I fully understand your concerns. We do need to be pragmatic and assertive if we want these test cycles methodology and infrastructure to be useful, So I enjoy the discussion. If there's no buy in from the community, then we should just forget about it. This kind of "manual" testing should never replace unit tests, they should serve as a complement in areas where it's harder or even impossible to test with unit tests. Things like human interaction with the interface, packaging, and integration tests where you need to connect to other services like PostGIS, Geoserver, and so on. Like you predicted, even "only" those non-unit-test-possible scenarios can reach hundreds of hours of testing. which may not be possible to run manually for every release. There are a couple of things we can use to make it more feasible:
Although people do use nightly builds and do some random testing, it's not a coordinated thing, there is no way to know what has been tested and what wasn't. I have no idea of the acceptance of this, but I think it opens the o yet another set of non-coding activities that common users (or maybe power users) can perform and help the project, beyond documentation and translations. I know a few QGIS-PT folks that I am sure they would like to participate. |
One thing I forgot to explain about the tester plugin, those URLs that show in the beginning, we can have multiple if we want to run the same integration test in different endpoints/versions. The tester plugin is a massive help if anyone has a bunch of workflows saved as models and want to make sure that they will work in a specific release. |
I really think acceptance / integration tests will catch a lot of global issues unit tests can't catch, and then we will be able to push unit tests. so, +1 for me, though we need to settle a solid and long term funding solution for the brave hearts that will run those - just like we need to do for bug triaging and review. I clearly think the budget growth should go in priority to those recurrent tasks. |
QGIS Enhancement: QGIS Quality Assurance methodology and infrastructure
Date 2020/05/23
Authors
Contact alexandre dot neto at cooperative dot net
maintainer @SrNetoChan
Version QGIS 3.18
Summary
This QEP aims to create the necessary infrastructure and methodology to organize and encourage systematic testing before each QGIS release:
In the last years, the QGIS project has given important steps towards improving QGIS stability. This includes: having regular, long term, and point releases; one-month feature freeze periods with funded bug-squashing; larger unit test coverage, and continuous integration.
Unfortunately, one of the weak points has been the lack of enough testing during the feature freeze period, which may lead to releases with too many unknown bugs. These bugs are only found when general users start to use the new stable version.
This proposal aims to create the necessary infrastructure and methodology to organize and encourage systematic testing before each release:
Not covered by this grant funding, we plan also to:
Introduction
In the last years, the QGIS project has given important steps towards improving QGIS stability. This includes: having regular, long term, and point releases; one-month feature freeze periods with funded bug-squashing; larger unit test coverage, and continuous integration.
Unfortunately, one of the weak points has been the lack of enough user testing during the feature freeze period, which may lead to releases with too many unknown bugs. These bugs are only found when general users start to use the new stable version.
Without an organizing effort, it's hard to predict how many users will test each release candidate and what features will they test.
Also, in the current situation, It's possible that QGIS support services providers are already doing some internal QA. But without communication between them, it's hard to avoid duplication of work
Proposed Solution
This proposal aims to create the necessary infrastructure and methodology to organize and encourage systematic testing before each release.
With this work, we hope to set the foundations for having a shared testing effort.
1. Setup a Testing management system to organize test cycles, assign and track test execution.
To encourage systematic testing and track its progress, we need a place to manage the test cases, describe their steps, assign the tests as tasks for testers, and log the results. We are thinking about using Kiwi TCMS
2. Elaborate and document a methodology to execute testing to help testers
We need to have a document that explains each step of the testing process, from creating new test cases to set up a clean testing environment.
3. Resurrect and move the qgis tester plugin to QGIS repositories. Publish it in the QGIS official repository
The QGIS tester plugin, which was created originally by the QGIS team at Boundless Spatial, allowed to have automated and semi-automated tests within a QGIS.
Tests are written in python and can be installed in the tester plugin as plugins or inside an existing plugin (to test the plugin functionality)
There are two types of tests:
There are already a set of automated and semi-automated tests that can be used as examples:
https://github.com/qcooperative/qgis-core-tests
4. Create an initial set of relevant test cases
In software like QGIS, it's nearly impossible to test every single functionality. Therefore, we need to be prudent and choose a realistic set of test cases. Besides realistic, they need to be relevant to the overall stability of the software. We should aim for broadly used functionality with a high risk of issues or regressions (like new features or code refactor). We should focus on tests that are hard to be covered by unit-tests. For example:
5. Organize and execute the initial test cases for the next releases (3.18, 3.20, 3.22)
We propose to execute the tests ourselves, for a few releases, in Windows and Linux. But during that time, we will try to attract more testers interested in helping with these platforms or do run the test cycles in other platforms.
Affected Files
NA
Performance Implications
NA
Further Considerations/Improvements
(optional)
Backwards Compatibility
NA
Votes
(required)
The text was updated successfully, but these errors were encountered: