This document resumes a couple of points we try to embrace in our coding style on python. Some of these points take an opinionated side on a trade-off story. The description will try to make that clear.
The driving motivation of this code style is to make your code more readable.
Readable is one word that hides several dimensions:
- the reader understands the intent very rapidly
- the reader can proofread. It can become confident that the code is correct very easily.
Noticing how the two are different should not require too much squinting. Shoot for proofreadability.
Do a pass on your own code before sending it for review to avoid wasting the review time. Also, a trivial code style issues can come in the way and avoid spotting deeper issues with the code.
As a reviewer, your first mission is proofreading. If you find a logical bug, feel good. You did an awesome job today.
Your second goal is to make sure the code quality stays high.
You can express "nitpicks": suggestions about some local aspect of the code that do not matter too much. Just prepend "nitpick:" to your comment. You can also express an opinion/advice that you know is not universal. Make sure you make it clear to the reviewee that it is fine to ignore the comment.
Do not use rhetorical questions... If you are 95% sure of something, there is no need to express it as a question.
Prefer I believe this should be n+1
to Shouldn't this be n+1?
.
The issue with rhetorical questions is that when you will have a genuine question, reviewees may over interpret it as an affirmation.
As a reviewee, if you are not used to CRs, it can feel like an adversarial process. Relax. This is normal to end up with a lot of comments on your first few CRs.
You might feel like the comments are unjustified, try as much as possible to not feel frustrated. If you want to discuss it, the best place is the chat, or maybe send a PR to modify this document.
But remember to pick your battles... If you think it does not matter much but it takes 2 secs to fix, just consider doing what is suggested by the reviewer or this style guide.
If you want to run the basic code styling checks on python (isort, flake8, mypy) execute:
make python-code-lint
And will run over all python packages to check if its ok and fix possible errors
Function and variable names are key for readability.
A good function name is often sufficient for the reader to build reasonable expectations of what it does.
If this implies long names, let's have very long names.
Trying to fit this rule has an interesting side effect. Nobody likes to type long function names. It just feels ugly. But these are frequently symptoms of a badly organized code, and it can help spot refactoring opportunities.
Use lower case and underlines for variables, classes and funcitons.
One incredibly powerful tool and simple tool to help make your code more readable is to introduce explanatory variables.
Explanatory variables are intermediary variables that were not really necessary, but make it possible -through their names- to convey their semantics to the reader.
As much as possible, do not use reuse the same variable name in a function. It is never necessary, very rarely helpful and can hurt and mypy will complain if the type is different.
Allways type your functions and variables so its easyer to read and check by mypy.
We prefer early return.
Rather than chaining else
statement, we prefer to isolate
corner case in short if
statement to prevent nesting
Inline comments in the code can be very useful to help the reader understand the justification of a thorny piece of code.
Test do not need to match the same quality as the original code.
When a bug is encountered, it is ok to introduce a test that seems weirdly overfitted to the specific issue. A comment should then add a link to the issue.
Unit test should run fast, and if possible they should not do any IO. Code should be structured to make unit testing possible.
Some of our unit tests would not be considered good unit tests in some companies, and that's ok.
Here are the controversial bits:
Our unit tests are not here just to spot regression. They are also here to check the correctness of our code.
Unit test do not only test public API. Complex code often calls half a dozen smaller functions.
The cardinality of the corner case of the complex code can make it difficult to test all corner case.
On the other hand, the smaller functions could be tested exhaustively.
For this reason, testing internal private functions is actually encouraged.
Ideally, unit tests should be testing one thing and one thing only, but if they don't and it helps cover more ground, this is ok.
Finally, unit tests are not necessarily deterministic. We really like proptests. When proptesting, make sure to reduce as much as possible the space of exploration to get the most out of it.
Testing all the architecture is complex and packages like nucliadb_one
and nucliadb_search
starts all dependency layers on a distributed way with SWIM
protocol, testing the real environment. Takes some time to fire all dependencies but provides a good real integration and functional test.
Your async code should block for at most 500 microseconds.
If you are unsure whether your code blocks for 500 microseconds, or if it is a non-trivial question, it should run via asyncio.run_in_executor
.
(thanks to quickwit for it!)