-
Notifications
You must be signed in to change notification settings - Fork 2
[ACIX-1035] Create git-related tools and helper classes
#181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
77d51c7 to
332cdf9
Compare
…Git` `Tool` class
* Move `GitConfig` class to `ToolsConfig` in `tools.py` * Update all references to `GitConfig` to use `ToolsConfig.git` instead of `RootConfig.git` * Update tests and documentation to reflect this change We should also consider moving this user/email couple to its own section instead of having it tied to `git`. `git` data could still be used as default values for this data however.
Since these methods were also used in contexts where the App is not fully initialized, i.e. during config generation, we have to implement a workaround allowing the methods to be called statically (instantiating `Git` requires passing in an `Application` object).
…ection in the config We are conceptually manipulating a "generic" username rather than something inherently attached to `git` - `git` is simply the default way by which this username is deduced. Thus we configure it as its own object, using `git` data as a default, and show a warning if the two are ever out of sync.
…nd related tool functions
…e changes * Implement `ChangeSet` class * Implement parsing function for building from `git diff` output * Include `ChangeSet` in commit details * Add `get_commit_changes` and `get_working_tree_changes` to git tool class
332cdf9 to
e68ded6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
| file: Path | ||
| type: ChangeType | ||
|
|
||
| patch: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the patch should be a property of the change but rather it would be used only to create the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we should not keep track of what changes were actually made ? Or do you have an idea for a different format for storing these changes, rather than the patch lines ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think about it in terms of usage patterns and I foresee two:
- A caller only cares about the file paths that have changed, for example to know what to test on a PR. This usage should be as fast as possible and the caller only needs a sequence of
str(preferable) orPathobjects representing relative paths from the root of the repository. - A caller does care about change metadata, for example a project with a single changelog file that has a CI job fail when a PR that modifies code does not add new lines to the history. This should also be very fast but of course requires extra processing compared to the first usage pattern. In this case, there is no need to save the patch contents because in order to create this class you would have had to consume the patch text already. So it should definitely have
pathandtypekeys but never persist the input it was given. If there is more metadata that would potentially be useful, you can add that now or we can when cases arise which is probably better. I was thinking abinaryboolean and alinesstructure with the counts ofadded&removed.
src/dda/utils/git/changeset.py
Outdated
|
|
||
|
|
||
| @dataclass(frozen=True, order=True) | ||
| class FileChanges: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT?
| class FileChanges: | |
| class ChangedFile: |
| @@ -0,0 +1,3 @@ | |||
| I am file1 ! | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I like your chosen mock data 😄
- The
testdatadirectory name is 100% a Go convention, it should be called fixtures. Also we should nest deeper under fixtures perhaps with agitdirectory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, would you prefer the following file structure:
tests/
fixtures/
git/
tools/
...
cli/
env/
helpers/
...
Or this one ?
tests/
cli/
fixtures/
...
env/
fixtures/
...
...
I think both have their advantages:
- The first one centralizes all test fixtures, which makes reusing them across test modules a lot easier (like I did here, where I use some both in
tests/utils/gitandtests/tools/test_git.py) - The second one keeps the fixtures closer to the tests which actually use them, which is probably easier to reason with.
Also, what about conftest.py files ? I don't think we can move them in a fixtures/ directory, as pytests handles loading them automatically and expects them at the same level or in a parent level to the test files themselves. Unless there is a way to override that ?
…and related functions
* Use `msgspec.Struct` instead of `dataclasses` for improved performance * Remove useless `git` property from `LinuxContainer` * Avoid runtime creation of mapping for `ChangeType`, instead using cascading `if`s * Restrict `diff_output` argument of `generate_from_diff_output` to `list` instead of arbitrary `Iterable` * Move `SHA1Hash` to `commit.py` and refactor usages * Remove extraneous '"' in user config * Fix `.git` suffix removal in `Git.get_remote_details` * Revert markdown table formatting in `config.md` * Move `UserConfig` further down in config file s
* Implement improved `SubprocessRunner.static_capture` method, replacing the `dda.tools.static_tool_capture` method * Remove `tool` param from above - all calls to this function are made statically * Merge `get_author_*` and equivalent properties + required test fixes * Remove `_query_git_author_*` methods, either directly calling `SubprocessRunner.static_capture` or `git.author_*` depending on context * Simplify test logic for setting git author name
…estdata` to `fixtures`
e68ded6 to
40774fd
Compare
…thor details * Restore `tools.GitConfig` config section * Allow multiple emails in User config * Add new `author_details` field, that can be either `inherit` or `system` * In `inherit` mode, set git author details via env vars instead of manually calling `git config` * Make `author_name` and `author_details` properties purer, not reading config by themselves. * Adapt tests, creating new fixtures for both author detail modes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking a lot better! Now overall let's try to reduce use of non-primitive objects in the parameters of APIs for better performance, only returning other types when necessary.
* Remove `repo_path` arguments like suggested by review * Implement methods for automatically determining, and comparing against, a merge base. * Add method in `Git` class to capture diff with a given set of args * Cleanup an unused test fixture * Add required tests
… `dict` subclass The composition pattern is safer than direct inheritance + using structs allows for easy serialization and deserialization, which is already useful for test data but will be even more in the future, in order to store build metadata.
5f28e3f to
f8d13ba
Compare
* Move `static_tool_capture` out of `SubprocessRunner` class * Remove `SHA1Hash` class, instead using normal `str`s and validating in `Commit.__post_init__` instead * Tweak args used for diffing in `Git._capture_diff_lines` * Tweak args used for getting untracked files in `Git.get_working_tree_changes` * Use `...` diffing in `Git.get_changes_with_base` * Convert `Commit` to a `Struct` instead of a dataclass * Fix tests
f8d13ba to
77eaf99
Compare
…instead of manually diffing every file This is much higher performance, as only a fixed number of calls to `git` have to be made (considering the speed of git vs Python itself, we can probably say this makes the function O(1) instead of O(n)). Moreover, this makes the function work properly on Windows, as we do not require `/dev/null` to exist anymore.
77eaf99 to
e48f51f
Compare
This new class centralizes all operations and information relating to git remotes and, in particular, GitHub. This pattern makes using git classes much clearer and purer.
…t`... (#184) * Port src/ changes from #181 * Port `tests/` changes from #181 * fix(git): Fix tests - move fixtures and use proper author details refactor(tests): Move fixtures to `git` folder feat(git): Add `commit_files` method replacing `tests.helpers.commit_file` + test changes fix(tests): Use `default_git_author` fixture in tests * feat(git): Add `Commit.enc_hook` and `Commit.dec_hook` * refactor(git): Make `Commit` objects frozen We can actually do this quite easily by avoiding the use of `_details` and `_changes` fields, and instead using `__dict__` to store the details and changes. This is the same technique that `cached_property` uses internally, so trying to access the cached property works as expected if the details or changes have been set by one of the `get_details_from_*` or `get_changes_from_*` methods. * refactor(git): Remove `commit_files` method * refactor(git): Minor fixes * refactor(git): Rename `FileChanges` to `ChangedFile` * refactor(git): Move everything related to GitHub to own file * refactor(git): Replace `__new__` override with explicit constructor * fix(git): Check for failures when calling `git diff` * refactor(git): Extract `add` and `commit` into individual methods * feat(git): Add `binary` field to `FileChanges` objects * refactor(git): Use regex-based parsing method for `generate_from_diff_output` * fix(git): Only validate length in `Commit` class * refactor(git): Simplify `Commit` class Spurred on by [this review comment](#184 (comment)). Not having access to `Application` here means the `get_changes_from_git` and `get_details_from_git` are not possible to implement. This then causes a cascade of things: - Only having one source for `details` and `changes` means we can just use standard `@cached_properties` - However, we don't want to favorize people using the Github API instead of calls to git as it is less reliable. - To prevent this, the easiest is just to remove the `details` and `changes` properties altogether. Callers can still access these by using the forward-relationship methods `Git.get_commit_details` or `Remote.get_commit_details_and_changes` - To keep a consistent style and avoid confusing people, we remove the other reverse-relationship methods (`Commit.head` - `Git.get_head_commit` can be used instead. Same for `compare_to`) - We can then remove all the proxying methods. - Since we don't need to encode `ChangeSet`s, the `enc_hooks` are not needed anymore, `Commit`s are natively encodable + Remove extraneous tests and fix remaining ones * refactor(git): Make `ChangeSet.changes` public and immutable Unfortunately `msgspec` does not natively support encoding/decoding `MappingProxyType` objects, so we need to implement custom logic for that: jcrist/msgspec#632 * feat(git): Use SHA256 for `ChangeSet.digest` * refactor(git): Rename `Git.get_remote_details` to `Git.get_remote` * refactor(git): Tweak implementation of `CommitDetails` - Store both author and commiter details - Store these as `tuple`s - Store the commit date as a UNIX timestamp and only convert to `datetime` when needed - Tweak call to `git show` format + use `NUL` characters for splitting - Remove `parent_shas` list * refactor(git): Use `capture` instead of `run` * refactor(git): Merge the `CommitDetails` class into Commit This makes instantiation a bit clunkier (as can be seen in tests), but directly instantiating a `Commit` will be rare, more often getting it from a `git` command or GitHub. * refactor(git): Pack author and committer details into a `GitPersonDetails` struct * fix(git): Fix recurring typo `commiter` -> `committer` * refactor(git): Minor tweaks - Remove "file structure" comments - Lazily import a couple of things * feat(git): Use a single set of public methods for encoding and decoding We also now encode Path objects as simple `str`s instead of encoding them as their `repr` - this requires a change to test data. * refactor(git): Rename `ChangedFile.file` -> `ChangedFile.path` * refactor(git): Make `ChangeSet` a standard Python class rather than `Struct` * refactor(git): Move `generate_from_diff_output` logic to `ChangeSet` * refactor(git): Move and rename `Remote.get_commit_and_changes_from_remote` * feat(git): Expand support to all types of git remotes * feat(git): Replace diff-related methods with single `Git.get_changes` This is a lot cleaner and easier to use. Thanks @ofek for the suggestion !
…t`... (#184) * Port src/ changes from #181 * Port `tests/` changes from #181 * fix(git): Fix tests - move fixtures and use proper author details refactor(tests): Move fixtures to `git` folder feat(git): Add `commit_files` method replacing `tests.helpers.commit_file` + test changes fix(tests): Use `default_git_author` fixture in tests * feat(git): Add `Commit.enc_hook` and `Commit.dec_hook` * refactor(git): Make `Commit` objects frozen We can actually do this quite easily by avoiding the use of `_details` and `_changes` fields, and instead using `__dict__` to store the details and changes. This is the same technique that `cached_property` uses internally, so trying to access the cached property works as expected if the details or changes have been set by one of the `get_details_from_*` or `get_changes_from_*` methods. * refactor(git): Remove `commit_files` method * refactor(git): Minor fixes * refactor(git): Rename `FileChanges` to `ChangedFile` * refactor(git): Move everything related to GitHub to own file * refactor(git): Replace `__new__` override with explicit constructor * fix(git): Check for failures when calling `git diff` * refactor(git): Extract `add` and `commit` into individual methods * feat(git): Add `binary` field to `FileChanges` objects * refactor(git): Use regex-based parsing method for `generate_from_diff_output` * fix(git): Only validate length in `Commit` class * refactor(git): Simplify `Commit` class Spurred on by [this review comment](#184 (comment)). Not having access to `Application` here means the `get_changes_from_git` and `get_details_from_git` are not possible to implement. This then causes a cascade of things: - Only having one source for `details` and `changes` means we can just use standard `@cached_properties` - However, we don't want to favorize people using the Github API instead of calls to git as it is less reliable. - To prevent this, the easiest is just to remove the `details` and `changes` properties altogether. Callers can still access these by using the forward-relationship methods `Git.get_commit_details` or `Remote.get_commit_details_and_changes` - To keep a consistent style and avoid confusing people, we remove the other reverse-relationship methods (`Commit.head` - `Git.get_head_commit` can be used instead. Same for `compare_to`) - We can then remove all the proxying methods. - Since we don't need to encode `ChangeSet`s, the `enc_hooks` are not needed anymore, `Commit`s are natively encodable + Remove extraneous tests and fix remaining ones * refactor(git): Make `ChangeSet.changes` public and immutable Unfortunately `msgspec` does not natively support encoding/decoding `MappingProxyType` objects, so we need to implement custom logic for that: jcrist/msgspec#632 * feat(git): Use SHA256 for `ChangeSet.digest` * refactor(git): Rename `Git.get_remote_details` to `Git.get_remote` * refactor(git): Tweak implementation of `CommitDetails` - Store both author and commiter details - Store these as `tuple`s - Store the commit date as a UNIX timestamp and only convert to `datetime` when needed - Tweak call to `git show` format + use `NUL` characters for splitting - Remove `parent_shas` list * refactor(git): Use `capture` instead of `run` * refactor(git): Merge the `CommitDetails` class into Commit This makes instantiation a bit clunkier (as can be seen in tests), but directly instantiating a `Commit` will be rare, more often getting it from a `git` command or GitHub. * refactor(git): Pack author and committer details into a `GitPersonDetails` struct * fix(git): Fix recurring typo `commiter` -> `committer` * refactor(git): Minor tweaks - Remove "file structure" comments - Lazily import a couple of things * feat(git): Use a single set of public methods for encoding and decoding We also now encode Path objects as simple `str`s instead of encoding them as their `repr` - this requires a change to test data. * refactor(git): Rename `ChangedFile.file` -> `ChangedFile.path` * refactor(git): Make `ChangeSet` a standard Python class rather than `Struct` * refactor(git): Move `generate_from_diff_output` logic to `ChangeSet` * refactor(git): Move and rename `Remote.get_commit_and_changes_from_remote` * feat(git): Expand support to all types of git remotes * feat(git): Replace diff-related methods with single `Git.get_changes` This is a lot cleaner and easier to use. Thanks @ofek for the suggestion ! b78a5e1
No description provided.