Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactors src/package.rs #132

Merged
merged 15 commits into from
Oct 20, 2023
Merged

Refactors src/package.rs #132

merged 15 commits into from
Oct 20, 2023

Conversation

xfbs
Copy link
Contributor

@xfbs xfbs commented Oct 18, 2023

This PR refactors the module package in the buffrs CLI. Since I am looking to integrate the validation crate with buffrs, I saw some low-hanging fruits for code improvement. On a high level, the improvements introduced by this PR address the following:

  • async correctness: not using blocking code in the path of async code. This is quite easy to get wrong in Rust, as of yet there is not much compiler help to avoid this (to my knowledge). In certain cases, it can be okay to mix the two (using a blocking mutex in a hot loop, writing a CLI that is single-threaded). But to build robust, reusable code we should try to get this right.
  • error handling: error handling is something that can always use some love. This PR adds some error types in strategic places, mostly to make testing easier.
  • test coverage: this PR attempts to increase net code coverage to 80%.
  • encapsulation: trying to not leak implementation details when it is not necessary to leak them. This allows us freedom to change the implementation without breaking too many things.
  • robustness: trying to avoid hidden assumptions by making them explicit. One example for this is the reliance on the current working directory in some of the code paths.

I try to keep the changes in here to be mostly non-opinionated, so that they are uncontroversial.

Edit: this PR is now ready. Here is a list of the major changes I have made here:

  • Consolidated the impl PackageStore blocks. There were multiple of them, I just squashed them together into a single one.
  • Removes superfluous TryFrom implementations. TryFrom and FromStr are handy, but we should only implement the ones we need. For cases where a string slice suffices, FromStr is the way to go. For cases where ownership is needed, TryFrom<String> is the way to go. Always prefer TryFrom<&str> over TryFrom<&String>, because the former is more generic (the latter is just one specific container type for a &str), but usually neither are useful because what you really want is FromStr. Here I just removed some implementations that we are not using and that do not add value.
  • Implements custom error type for PackageName parsing. Here I implemented a custom error type, PackageNameError, using thiserror. This error type helps write useful unit tests, miette is neat for displaying but not so much for "low-level" errors if that makes sense.
  • Changes Deref of PackageName to target str. In general, similar as with the TryFrom implementation, always prefer &str over &String. The reason is that &str is a generic string (can be backed by any container). Here, I just swapped out the Target of the Deref and made it work.
  • Moves resolving code into src/resolver.rs: in having one module do one thing, I felt that resolving was different enough to warrant putting it into it's own module.
  • Derive Default for PackageType. This type already had a manually-implemented Default, but I always prefer derive implementations over hand-rolled ones unless absolutely necessary, for maintainability and brevity reasons.
  • Don't use unwrap_or_default: Rust has amazing error handling, but there are cases where things actually should panic. This is one of those, so I have switched to using unreachable rather than silently converting to an empty string. If this ever breaks, we certainly want to know about it!
  • Removes From<&Package> implementation: this implementation was not used and does not add any value, because the data is available in a public field.
  • Add unit tests for PackageType: this cleans up PackageType a bit, adding unit tests, making the code a bit clearer.
  • Remove implicit assumptions from PackageStore: The PackageStore currently heavily relies on the current working directory. I do not like using mutable global state, and also for testability purposes this is not great. So instead I added a field that contains the root path that all relative paths are resolved using.
  • Don't read Manifest in PackageStore::release: in the interest of decoupling things, we don't really want the PackageStore to have to interact with manifests (such as reading them). Rather, we want to decouple them by having it such that the Manifest is a method argument that we simply pass through.
  • Decouples PackageStore from compression: similar as with the Manifest, our PackageStore should only ever deal with package storage things and not with compression. So I have moved the compression-related code out into Package instead, this decouples things.

Before:

test-coverage-before

After:

test-coverage-after

I did not quite get the test coverage up to 80%, but we did hit about 77% (with some cheating because I have moved out the resolver code).

This PR turned out rather wordy, but I hope it all makes sense! I think in a next iteration I could tackle the testing situation a bit more.

@xfbs xfbs self-assigned this Oct 18, 2023
xfbs added 2 commits October 18, 2023 16:37
There are only two implementations which semantically make sense:
- TryFrom<String>, because it moves the underlying string and thus
  avoids a Clone.
- FromStr, because it is commonly used for parsing. It takes a
  reference and thus requires a Clone.

The other implementations do not add any value and are unused, for that
reason they should be removed.
@xfbs xfbs force-pushed the pe/package branch 3 times, most recently from 8922567 to ad6a142 Compare October 18, 2023 20:38
@xfbs xfbs marked this pull request as ready for review October 18, 2023 20:38
@xfbs xfbs added complexity::low Issues or ideas with a low implementation cost component::cli Everything related to the buffrs cli type::refactoring Changing the inner-workings of buffrs labels Oct 19, 2023
Copy link
Contributor

@mara-schulke mara-schulke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall i like the split of "resolver" vs "package" – i would strive for as functional / pure designs as possible (i know that is not the case right now due to pragmatism) – but if we refactor stuff, i would want to move away from oop / imperative code as much as sensibly possible

src/package.rs Outdated Show resolved Hide resolved
@xfbs
Copy link
Contributor Author

xfbs commented Oct 19, 2023

Discussed this today, the resolution here is that I will add a context to the main.rs which contains an instance (perhaps an Arc<PackageStore>) to avoid instances of this being created all over the place. Most of the code should have access to this context struct and it can be used to add other shared, immutable state.

Besides that, perhaps in a future PR, we should try to make sure that we don't mix pure and non-pure code all over the place. To some extend, this PR already addresses that, but there is still opportunities to improve that in future PRs.

src/command.rs Outdated Show resolved Hide resolved
src/command.rs Outdated Show resolved Hide resolved
src/command.rs Outdated Show resolved Hide resolved
src/command.rs Outdated Show resolved Hide resolved
src/package.rs Outdated Show resolved Hide resolved
src/package.rs Outdated Show resolved Hide resolved
src/package.rs Outdated Show resolved Hide resolved
src/package.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@mara-schulke mara-schulke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So overall, yes we can merge this, but i found a few nits regarding style, and a few issues in terms of unnecessarily moved code, leaking of impl details (see the error related comments, maybe take a look at #95 and #103 for ref) and ergonomic issues (especially the into_diagnostic thing)

xfbs added 11 commits October 20, 2023 10:34
Parsing is a situation where it is often very useful to have custom
error types. The reason for this is that it makes it simpler to write
useful unit tests that catch specific parsing errors. In this commit,
the `PackageNameError` type was introduced which catches the three error
kinds that can occur when parsing a package name.

Additionally, unit tests are added that trigger each of the
aforementioned error cases.

Finally, the `Debug` trait is implemented by derive macro rather than
manually, as there does not seem to be a good reason for doing it
manually.
The `Deref` trait allows for types to automatically downcast to another
type. It is one of the few places where the compiler tries to be smart,
rather than requiring the programmer to be explicit.

Usually, this trait is used to make custom types behave like more
generic types. In this case, the `PackageName` type wraps a string, and
should behave mostly like one.

However, the appropriate type to target here is not `String`, but `str`.
Why is that? Both are stringly typed, but there is an important
difference. `str` is the most generic type — it is the equivalent of a
UTF-8 encoded unsized byte array. `String` is one possible container
type for `str`, but not the only one. In fact, the choice of the
container type is arbitrary and an implementation detail.

In the interest of building abstractions but being flexible, any types
which implement `Deref` to some string type should dereference to an
`&str` rather than a `&String`.
One module should do one thing. The `package` module defines types and
methods to work with packages and the package store. Previously, it also
contained code to resolve package dependencies, but this commit moves
that logic out into it's own module since it is sufficiently separate
and self-contained that in the interest of encapsulation, it should be
considered a separate thing.
There are different kinds of errors that can crop up. Some errors need
to be neatly returned in a way that they can be handled, some errors
need to be passed through to be displayed.

In the case of the `to_str()` method converting a `PackageType` to a
string, this is not something that should reasonable ever fail, because
`PackageType` is an enum. However, due to this API being built such that
it accepts any type that implements `Serialize`, the function returns a
`Result`. The important thing here is that:

- This should never error.
- If it does error, it constitutes a major issue with the code.
- This error cannot be handled, but it must at least be communicated.

Using `unwrap_or_default()` here is a bad idea because it hides the fact
that an error has happened, if it were ever to occur. This is a very
good example of a situation where code should panic. Using the
`unreachable` macro here communicates that this is not something that
will ever error, and the panic is not left in place because someone was
too lazy to handle it properly.
Package has a public field called `tgz` which contains the compressed
archive of the package. Adding a From<&Package> conversion does not add
anything semantically, it makes more sense to use that field directly.
This adds unit tests for PackageType to test round-trip parsing and some
methods. Renames the publishable() and compilable() methods to
is_publishable() and is_compilable() for better clarity. Makes sure they
are used in the code base.
The current implementation of `PackageStore` uses paths that are
resolved using the current working directory. This approach is not a
recommended one, for multiple reasons:

- Application logic should not rely on the current working directory
  staying the same during a program run, as libraries could at any time
  change it. It is implicit global state.
- Relying on this makes it more difficult to thoroughly test the code.
- Relying on implicit global state makes it more difficult to debug some
  code, as the runtime behaviour depends on invisible global variables.

The approach here is that the `PackageStore` now instead contains a
`PathBuf` which represents the root directory from which all relative
paths are resolved. This is robust against changing the current working
directory.

This strategy is also very suitable for unit testing, as it allows
multiple `PackageStore` instances to be used simultaneously with little
effort.
The separation of concerns is a very useful technique that allows
building software that is not too tightly coupled. Software built like
this tends to be more testable, and as a result more robust.

In this case, the `PackageStore` is responsible for one thing, and one
thing only: storing packages. It should not read manifests, compress
data or do anything else that goes beyond what it was intended.

For this reason, a better design is for the `release()` method to accept
a `Manifest` that is already read. That way, this module does not need
to concern itself with how manifests are to be read.
Usually, I prefer to have simple tests (ones that do not involve any
setup) to be close to the code that they are testing, and only put tests
that require some amount of setup into a tests module.
@xfbs
Copy link
Contributor Author

xfbs commented Oct 20, 2023

I've addressed the feedback:

  • I've created some issues for some of the issues, because I do not want this PR to get any larger than it already is. Prefer to work in small chunks.
  • Some of the issues I have addressed
  • Some of the issues I disagree with, and I have left a comment explaining the rationale.

xfbs added 2 commits October 20, 2023 10:59
I do not thing that this is the correct thing to do, but for now this is
what the error handling is supposed to look like.
@xfbs xfbs merged commit 6460682 into main Oct 20, 2023
5 checks passed
@xfbs xfbs deleted the pe/package branch October 20, 2023 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity::low Issues or ideas with a low implementation cost component::cli Everything related to the buffrs cli type::refactoring Changing the inner-workings of buffrs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants