Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coding style guide #95

Merged
merged 24 commits into from
Dec 19, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix
  • Loading branch information
jlamypoirier committed Dec 18, 2024
commit 401b5b29b67b1e9643749bc667e0b97d76b00b04
4 changes: 4 additions & 0 deletions docs/developers/style-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ pre-commit install
Names should be as self-explanatory as possible, within reason.
This includes python identifiers (classes, variables, methods, modules, etc.), file names and configuration parameters.
For example:

* Avoid abbreviations, especially domain-specific ones. Ex. `bs -> batch_size`.
* Avoid meaningless variable names such as `x`, except for generic methods.
* Avoid redundancies especially for configuration parameters, ex. `data.data_type` -> `data.type`.
Expand All @@ -46,6 +47,7 @@ for example configuration parameters and the public interface of core classes an
## πŸ›¬ Imports

We use the following conventions for imports (other than those enforced by isort):

* Import standard library and third party modules by module (ex. `import package.module`, not `from package.module import method`).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why prefer import package.module universally? Fast-LLM is the only library I know that does that. I think this needs to be well motivated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a personal preference, but Fast-LLM doesn't use that many external libraries and nearly all of them keep module identifiers short so it's not a big difference. Main benefits are:

  • Consistency: we're not doing things different for every file depending on who wrote it.
  • We know what package an identifier comes from with just a quick look at the code, no digging at all.
  • It's especially useful for commonly used names that could come from multiple packages, or that are also present in Fast-LLM (one example: Fast-LLM's config has many names in common with dataclasses, omegaconf and pydantic)
  • We could allow exceptions, but this would be subjective and hard to manage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spelling it out. Could you add points 1-3 to the doc, please?

This keeps identifier's origin explicit.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the expense of very long identifiers in the codebase, which could be alleviated with as imports, but that is not allowed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hasn't really been a problem so far. Almost every identifier shows as package.identifier, so isn't really long.

* Avoid renaming with `as`, except for some (arbitrarily chosen) common ones: `numpy as np`, `triton.language as tl`.
Expand All @@ -70,6 +72,7 @@ If an offending import is only required for a type hint, include it in a `if typ

Although good practices of object-oriented programming are generally ignored in python,
Fast-LLM attempts to follow them to an extant, while avoiding unnecessary bloat:
jlamypoirier marked this conversation as resolved.
Show resolved Hide resolved

* Mark private and protected variables with an underscore `_` prefix.
As is customary in python, we make no distinction between the two and avoid the double-underscore `__` notation.
* Keep public interfaces (methods and variables without underscore prefix) as lean as possible.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would help if it was explained/exemplified what should remain public vs. private

Expand All @@ -80,6 +83,7 @@ usually to define read-only public variables.

Fast-LLM uses type hints for several reasons, including code readability, type checking in IDEs,
and type validation for configurations:

* Always use type hints for the public interface of a classes and modules.
Type hints for method outputs may be omitted if they can be easily inferred.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more precise here? What's "easily inferred" depends on the person who writes/reads the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know what to put there. So far I've mostly ignored output type hints except for abstract classes and when pycharm gives me trouble, but I think we need a better rule

* Prefer using type hints in private interfaces, especially if it improves readability and/or static type checking.
Expand Down
Loading