Rules to live by. Written using GitLab, git, and Jira terminology, but generalizable to any software development tools.
- Use trunk based development and avoid multiple persistent branches
- Integrate early and often (continuously)
- Give everyone (even people outside your team) developer access and don't attempt to control what they do
- Just protect the main branch
- Always make the change in the most-upstream repository
- Prefer Merge Requests (MRs) over Jira tickets
- MRs allow one click completion of the changeset if there's agreement
- MRs allow visible discussion of the changeset alongside the code to reach agreement, or to remain in disagreement and close the MR with no action
- MRs automatically validate the changeset by running the pipeline
- MRs avoid the misinterpretation that often happens when playing telephone through email β‘ chat β‘ Jira β‘ git
- Jira tickets are useful as a promise to do some work in the future that we don't have time to do now
- Only if you actually promise to do that work
- Clear and comprehensive MR descriptions are far more important than individual commit messages
- Updating the MR description costs nothing, work on continuously improving it
- GIFs in MR descriptions are incredibly powerful persistent demos
- If a Jira ticket exists, reference it in MR description to automatically link and optionally close upon completion
- Make your work visible
- Open draft MRs extremely early (even just a README update saying what you're going to go do) to allow early review and course correction
- Commit and push frequently
- Keep MRs small
- Keep branches short-lived and merge from main frequently
- Match existing code
- Unless that code is broken or terrible, in which case update it everywhere, not just in your new code
- Drive MRs to closure before opening new ones
- Unless you find a bug that can be spun off and completed via a smaller incremental MR sooner
- Make time to review teammates MRs on a recurring basis
- Like while waiting on review of your MR and after you finish it before you start the next one
- Resolve MR concerns efficiently
- Developers are empowered to resolve threads about typos when they make the fix
- Larger / open-ended / architectural concerns shall be resolved by the originator of the thread (only)
- Reviewers should prefer the suggest changes feature whenever possible to allow quicker application by the developer
- Or just push a commit that fixes that spelling mistake
- But don't push big changes without talking to the developer
- Or just push a commit that fixes that spelling mistake
- Once a reviewer approves, consider all threads they originated resolvable by anyone
- Reviewers should consider explicitly stating "feel free to resolve" or "I'd like to review the changes" to avoid confusion
- Link liberally to other MRs and comment threads, those linkages will pay significant dividends later
- Add all stakeholders as reviewers to all MRs, even if they aren't on the team that owns that repository
- Automate everything that can be automated
- Unless that automation is fragile and will cost more to maintain than doing it manually
- Establish coding standards early and automate enforcement
- Don't spend any time arguing about coding standards that aren't automatically enforced
- Checklists are useful, only for things that can't be automated
- Don't build or rely on anything that you aren't willing or able to maintain
- You aren't able to maintain plain text documentation of source code ... autogenerate it instead
- You aren't able to maintain complex shell scripts
- You aren't able to maintain anything that doesn't have tests
- Beware the normalization of pipeline failures: always keep the main branch healthy
- Triaging and squashing flakes is higher priority than developing new features
- Leverage third party tools (CMake) and third party documentation, don't write custom wrappers (build.sh)
- Never duplicate code
- Unless it's more complex or fragile to rely on one copy
- Never triplicate code
- no large blobs committed to version control (use git lfs)
- no user-specific files committed to version control
- no autogenerated files committed to version control
- clean clone provides at least one supported workflow (with IDE integrations) out of the box
- normal development doesn't create or rely-on changes to any version controlled files that can't or shouldn't be committed
- prefer fewer repositories to minimize cross-repo dependency management and allow atomic changes
- one repo that submodules in 10+ others doesn't count as fewer repos
- prefer fewer persistent branches (trunk based development)
- prefer flatter directory structures
This excerpt from Software Engineering at Google describes the value of a CI pipeline
tests derive their value from the trust engineers place in them. ... A bad test suite can be worse than no test suite at all.
CI pipelines are most valuable when they distill the results of the test suite into a single, reliable pass (β ) or fail (β) signal that is visible at a glance without inspecting any additional logs. This allows developers to instantly understand if the code under test is working as intended or has had a regression. To maximize the value of a CI pipeline, everyone working in the repository must agree on the definition of pass and fail and the expected developer response:
- β means all tests that we wanted to run that we expect to pass did run and did pass, the developer can take action without reviewing logs
- β means the unit under test failed some test that we expect to pass and requires developer action
- in main, sound the alarm, pause new feature development, and get main healthy, which could be by skipping a test or marking it as expected to fail while investigating root cause
- in a feature branch, don't merge until you make changes and get to β
Efficient development requires that β not mean something squishy like βwell something new may be broken, or maybe just the things that are currently expected to be broken are broken and everything else is fine, please manually inspect the logs to determine the answer.β
Efficient development does not accept persistent β on the main branch, since that affects and sometimes blocks all ongoing development by providing an unstable baseline for comparison. It also normalizes β, which undermines the teamβs panic reflex and delays detection of real problems.
Efficient development minimizes the use of β /β/π‘ as an unclear middle ground between β and β that does not give the developer a clear signal of what to do next. Some teams or pipelines will define and use β /β/π‘; the process defined in this Wiki consciously avoids that complexity.
A false pass is a β that doesn't mean the above, for example because some tests that we wanted to run did not run due to an error in our pipeline or test infrastructure.
A false failure is a β that doesn't mean the above, for example because the CI infrastructure failed during setup without actually running the test suite, or because the only test that failed is currently expected to fail at some frequency. This can also be described as an "infrastructure failure" or a "flaky test."
Note that false failures and flaky tests are harmful because they don't mean the unit under test failed in an unexpected way, so they waste developer time and, if not squashed, erode the team's trust in the pipeline. If developers click retry (π) without looking at the logs, the trust is gone and the pipeline is no longer adding value.
If you have a few thousand tests, each with a very tiny bit of nondeterminism, running all day, occasionally one will probably fail (flake). As the number of tests grows, statistically so will the number of flakes. If each test has even a 0.1% of failing when it should not, and you run 10,000 tests per day, you will be investigating 10 flakes per day. Each investigation takes time away from something more productive that your team could be doing.
Changes to pipeline definitions or the test suite itself require extra scrutiny and detailed log review because they can increase the false-pass or false-fail rate. A best practice when introducing a new test is to intentionally inject a failure in the unit under test and ensure that the pipeline catches it and reports β.
Ensuring adherence to the above best practices and expected responses is everyone's responsibility, not just the Maintainer, Product Owner, Responsible Engineer, etc.. If you see something, say something.