Skip to content

Conversation

@Ishirui
Copy link
Contributor

@Ishirui Ishirui commented Sep 1, 2025

No description provided.

@Ishirui Ishirui force-pushed the pierrelouis.veyrenc/create-git-tooling branch 9 times, most recently from 77d51c7 to 332cdf9 Compare September 3, 2025 16:22
* Move `GitConfig` class to `ToolsConfig` in `tools.py`
* Update all references to `GitConfig` to use `ToolsConfig.git` instead of `RootConfig.git`
* Update tests and documentation to reflect this change

We should also consider moving this user/email couple to its own section instead of having it tied to `git`. `git` data could still be used as default values for this data however.
Since these methods were also used in contexts where the App is not fully initialized, i.e. during config generation, we have to implement a workaround allowing the methods to be called statically (instantiating `Git` requires passing in an `Application` object).
…ection in the config

We are conceptually manipulating a "generic" username rather than something inherently attached to `git` - `git` is simply the default way by which this username is deduced. Thus we configure it as its own object, using `git` data as a default, and show a warning if the two are ever out of sync.
…e changes

* Implement `ChangeSet` class
* Implement parsing function for building from `git diff` output
* Include `ChangeSet` in commit details
* Add `get_commit_changes` and `get_working_tree_changes` to git tool class
@Ishirui Ishirui force-pushed the pierrelouis.veyrenc/create-git-tooling branch from 332cdf9 to e68ded6 Compare September 3, 2025 17:06
Copy link
Contributor

@ofek ofek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

file: Path
type: ChangeType

patch: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the patch should be a property of the change but rather it would be used only to create the class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should not keep track of what changes were actually made ? Or do you have an idea for a different format for storing these changes, rather than the patch lines ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think about it in terms of usage patterns and I foresee two:

  1. A caller only cares about the file paths that have changed, for example to know what to test on a PR. This usage should be as fast as possible and the caller only needs a sequence of str (preferable) or Path objects representing relative paths from the root of the repository.
  2. A caller does care about change metadata, for example a project with a single changelog file that has a CI job fail when a PR that modifies code does not add new lines to the history. This should also be very fast but of course requires extra processing compared to the first usage pattern. In this case, there is no need to save the patch contents because in order to create this class you would have had to consume the patch text already. So it should definitely have path and type keys but never persist the input it was given. If there is more metadata that would potentially be useful, you can add that now or we can when cases arise which is probably better. I was thinking a binary boolean and a lines structure with the counts of added & removed.



@dataclass(frozen=True, order=True)
class FileChanges:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT?

Suggested change
class FileChanges:
class ChangedFile:

@@ -0,0 +1,3 @@
I am file1 !
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I like your chosen mock data 😄
  2. The testdata directory name is 100% a Go convention, it should be called fixtures. Also we should nest deeper under fixtures perhaps with a git directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, would you prefer the following file structure:

tests/
    fixtures/
        git/
        tools/
        ...
    cli/
    env/
    helpers/
    ...

Or this one ?

tests/
    cli/
        fixtures/
            ...
    env/
        fixtures/
            ...
    ...

I think both have their advantages:

  • The first one centralizes all test fixtures, which makes reusing them across test modules a lot easier (like I did here, where I use some both in tests/utils/git and tests/tools/test_git.py)
  • The second one keeps the fixtures closer to the tests which actually use them, which is probably easier to reason with.

Also, what about conftest.py files ? I don't think we can move them in a fixtures/ directory, as pytests handles loading them automatically and expects them at the same level or in a parent level to the test files themselves. Unless there is a way to override that ?

* Use `msgspec.Struct` instead of `dataclasses` for improved performance
* Remove useless `git` property from `LinuxContainer`
* Avoid runtime creation of mapping for `ChangeType`, instead using cascading `if`s
* Restrict `diff_output` argument of `generate_from_diff_output` to `list` instead of arbitrary `Iterable`
* Move `SHA1Hash` to `commit.py` and refactor usages
* Remove extraneous '"' in user config
* Fix `.git` suffix removal in `Git.get_remote_details`
* Revert markdown table formatting in `config.md`
* Move `UserConfig` further down in config file s
* Implement improved `SubprocessRunner.static_capture` method, replacing the `dda.tools.static_tool_capture` method
* Remove `tool` param from above - all calls to this function are made statically
* Merge `get_author_*` and equivalent properties + required test fixes
* Remove `_query_git_author_*` methods, either directly calling `SubprocessRunner.static_capture` or `git.author_*` depending on context
* Simplify test logic for setting git author name
@Ishirui Ishirui force-pushed the pierrelouis.veyrenc/create-git-tooling branch from e68ded6 to 40774fd Compare September 4, 2025 18:03
…thor details

* Restore `tools.GitConfig` config section
* Allow multiple emails in User config
* Add new `author_details` field, that can be either `inherit` or `system`
* In `inherit` mode, set git author details via env vars instead of manually calling `git config`
* Make `author_name` and `author_details` properties purer, not reading config by themselves.
* Adapt tests, creating new fixtures for both author detail modes
Copy link
Contributor

@ofek ofek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking a lot better! Now overall let's try to reduce use of non-primitive objects in the parameters of APIs for better performance, only returning other types when necessary.

* Remove `repo_path` arguments like suggested by review
* Implement methods for automatically determining, and comparing against, a merge base.
* Add method in `Git` class to capture diff with a given set of args
* Cleanup an unused test fixture
* Add required tests
… `dict` subclass

The composition pattern is safer than direct inheritance + using structs allows for easy serialization and deserialization, which is already useful for test data but will be even more in the future, in order to store build metadata.
@Ishirui Ishirui force-pushed the pierrelouis.veyrenc/create-git-tooling branch from 5f28e3f to f8d13ba Compare September 8, 2025 14:18
* Move `static_tool_capture` out of `SubprocessRunner` class
* Remove `SHA1Hash` class, instead using normal `str`s and validating in `Commit.__post_init__` instead
* Tweak args used for diffing in `Git._capture_diff_lines`
* Tweak args used for getting untracked files in `Git.get_working_tree_changes`
* Use `...` diffing in `Git.get_changes_with_base`
* Convert `Commit` to a `Struct` instead of a dataclass
* Fix tests
@Ishirui Ishirui force-pushed the pierrelouis.veyrenc/create-git-tooling branch from f8d13ba to 77eaf99 Compare September 8, 2025 14:48
…instead of manually diffing every file

This is much higher performance, as only a fixed number of calls to `git` have to be made (considering the speed of git vs Python itself, we can probably say this makes the function O(1) instead of O(n)).
Moreover, this makes the function work properly on Windows, as we do not require `/dev/null` to exist anymore.
@Ishirui Ishirui force-pushed the pierrelouis.veyrenc/create-git-tooling branch from 77eaf99 to e48f51f Compare September 8, 2025 15:38
This new class centralizes all operations and information relating to git remotes and, in particular, GitHub.
This pattern makes using git classes much clearer and purer.
Ishirui added a commit that referenced this pull request Sep 9, 2025
Ishirui added a commit that referenced this pull request Sep 9, 2025
@Ishirui
Copy link
Contributor Author

Ishirui commented Sep 9, 2025

Closing in favor of #183 and #184

@Ishirui Ishirui closed this Sep 9, 2025
Ishirui added a commit that referenced this pull request Sep 9, 2025
Ishirui added a commit that referenced this pull request Sep 9, 2025
Ishirui added a commit that referenced this pull request Sep 10, 2025
Ishirui added a commit that referenced this pull request Sep 10, 2025
Ishirui added a commit that referenced this pull request Sep 11, 2025
Ishirui added a commit that referenced this pull request Sep 11, 2025
Ishirui added a commit that referenced this pull request Sep 11, 2025
Ishirui added a commit that referenced this pull request Sep 11, 2025
@ofek ofek deleted the pierrelouis.veyrenc/create-git-tooling branch September 11, 2025 22:20
Ishirui added a commit that referenced this pull request Sep 23, 2025
Ishirui added a commit that referenced this pull request Sep 23, 2025
Ishirui added a commit that referenced this pull request Sep 26, 2025
…t`... (#184)

* Port src/ changes from #181

* Port `tests/` changes from #181

* fix(git): Fix tests - move fixtures and use proper author details

refactor(tests): Move fixtures to `git` folder

feat(git): Add `commit_files` method replacing `tests.helpers.commit_file` + test changes

fix(tests): Use `default_git_author` fixture in tests

* feat(git): Add `Commit.enc_hook` and `Commit.dec_hook`

* refactor(git): Make `Commit` objects frozen

We can actually do this quite easily by avoiding the use of `_details` and `_changes` fields,
and instead using `__dict__` to store the details and changes.
This is the same technique that `cached_property` uses internally,
so trying to access the cached property works as expected if the details or changes have been set by one of the `get_details_from_*` or `get_changes_from_*` methods.

* refactor(git): Remove `commit_files` method

* refactor(git): Minor fixes

* refactor(git): Rename `FileChanges` to `ChangedFile`

* refactor(git): Move everything related to GitHub to own file

* refactor(git): Replace `__new__` override with explicit constructor

* fix(git): Check for failures when calling `git diff`

* refactor(git): Extract `add` and `commit` into individual methods

* feat(git): Add `binary` field to `FileChanges` objects

* refactor(git): Use regex-based parsing method for `generate_from_diff_output`

* fix(git): Only validate length in `Commit` class

* refactor(git): Simplify `Commit` class

Spurred on by [this review comment](#184 (comment)). Not having access to `Application` here means the `get_changes_from_git` and `get_details_from_git` are not possible to implement.

This then causes a cascade of things:
- Only having one source for `details` and `changes` means we can just use standard `@cached_properties`
- However, we don't want to favorize people using the Github API instead of calls to git as it is less reliable.
- To prevent this, the easiest is just to remove the `details` and `changes` properties altogether. Callers can still access these by using the forward-relationship methods `Git.get_commit_details` or `Remote.get_commit_details_and_changes`
- To keep a consistent style and avoid confusing people, we remove the other reverse-relationship methods (`Commit.head` - `Git.get_head_commit` can be used instead. Same for `compare_to`)
- We can then remove all the proxying methods.
- Since we don't need to encode `ChangeSet`s, the `enc_hooks` are not needed anymore, `Commit`s are natively encodable

+ Remove extraneous tests and fix remaining ones

* refactor(git): Make `ChangeSet.changes` public and immutable

Unfortunately `msgspec` does not natively support encoding/decoding `MappingProxyType` objects, so we need to implement custom logic for that: jcrist/msgspec#632

* feat(git): Use SHA256 for `ChangeSet.digest`

* refactor(git): Rename `Git.get_remote_details` to `Git.get_remote`

* refactor(git): Tweak implementation of `CommitDetails`

- Store both author and commiter details
- Store these as `tuple`s
- Store the commit date as a UNIX timestamp and only convert to `datetime` when needed
- Tweak call to `git show` format + use `NUL` characters for splitting
- Remove `parent_shas` list

* refactor(git): Use `capture` instead of `run`

* refactor(git): Merge the `CommitDetails` class into Commit

This makes instantiation a bit clunkier (as can be seen in tests), but directly instantiating a `Commit` will be rare, more often getting it from a `git` command or GitHub.

* refactor(git): Pack author and committer details into a `GitPersonDetails` struct

* fix(git): Fix recurring typo `commiter` -> `committer`

* refactor(git): Minor tweaks

- Remove "file structure" comments
- Lazily import a couple of things

* feat(git): Use a single set of public methods for encoding and decoding

We also now encode Path objects as simple `str`s instead of encoding them as their `repr` - this requires a change to test data.

* refactor(git): Rename `ChangedFile.file` -> `ChangedFile.path`

* refactor(git): Make `ChangeSet` a standard Python class rather than `Struct`

* refactor(git): Move `generate_from_diff_output` logic to `ChangeSet`

* refactor(git): Move and rename `Remote.get_commit_and_changes_from_remote`

* feat(git): Expand support to all types of git remotes

* feat(git): Replace diff-related methods with single `Git.get_changes`

This is a lot cleaner and easier to use. Thanks @ofek for the suggestion !
github-actions bot pushed a commit that referenced this pull request Sep 26, 2025
…t`... (#184)

* Port src/ changes from #181

* Port `tests/` changes from #181

* fix(git): Fix tests - move fixtures and use proper author details

refactor(tests): Move fixtures to `git` folder

feat(git): Add `commit_files` method replacing `tests.helpers.commit_file` + test changes

fix(tests): Use `default_git_author` fixture in tests

* feat(git): Add `Commit.enc_hook` and `Commit.dec_hook`

* refactor(git): Make `Commit` objects frozen

We can actually do this quite easily by avoiding the use of `_details` and `_changes` fields,
and instead using `__dict__` to store the details and changes.
This is the same technique that `cached_property` uses internally,
so trying to access the cached property works as expected if the details or changes have been set by one of the `get_details_from_*` or `get_changes_from_*` methods.

* refactor(git): Remove `commit_files` method

* refactor(git): Minor fixes

* refactor(git): Rename `FileChanges` to `ChangedFile`

* refactor(git): Move everything related to GitHub to own file

* refactor(git): Replace `__new__` override with explicit constructor

* fix(git): Check for failures when calling `git diff`

* refactor(git): Extract `add` and `commit` into individual methods

* feat(git): Add `binary` field to `FileChanges` objects

* refactor(git): Use regex-based parsing method for `generate_from_diff_output`

* fix(git): Only validate length in `Commit` class

* refactor(git): Simplify `Commit` class

Spurred on by [this review comment](#184 (comment)). Not having access to `Application` here means the `get_changes_from_git` and `get_details_from_git` are not possible to implement.

This then causes a cascade of things:
- Only having one source for `details` and `changes` means we can just use standard `@cached_properties`
- However, we don't want to favorize people using the Github API instead of calls to git as it is less reliable.
- To prevent this, the easiest is just to remove the `details` and `changes` properties altogether. Callers can still access these by using the forward-relationship methods `Git.get_commit_details` or `Remote.get_commit_details_and_changes`
- To keep a consistent style and avoid confusing people, we remove the other reverse-relationship methods (`Commit.head` - `Git.get_head_commit` can be used instead. Same for `compare_to`)
- We can then remove all the proxying methods.
- Since we don't need to encode `ChangeSet`s, the `enc_hooks` are not needed anymore, `Commit`s are natively encodable

+ Remove extraneous tests and fix remaining ones

* refactor(git): Make `ChangeSet.changes` public and immutable

Unfortunately `msgspec` does not natively support encoding/decoding `MappingProxyType` objects, so we need to implement custom logic for that: jcrist/msgspec#632

* feat(git): Use SHA256 for `ChangeSet.digest`

* refactor(git): Rename `Git.get_remote_details` to `Git.get_remote`

* refactor(git): Tweak implementation of `CommitDetails`

- Store both author and commiter details
- Store these as `tuple`s
- Store the commit date as a UNIX timestamp and only convert to `datetime` when needed
- Tweak call to `git show` format + use `NUL` characters for splitting
- Remove `parent_shas` list

* refactor(git): Use `capture` instead of `run`

* refactor(git): Merge the `CommitDetails` class into Commit

This makes instantiation a bit clunkier (as can be seen in tests), but directly instantiating a `Commit` will be rare, more often getting it from a `git` command or GitHub.

* refactor(git): Pack author and committer details into a `GitPersonDetails` struct

* fix(git): Fix recurring typo `commiter` -> `committer`

* refactor(git): Minor tweaks

- Remove "file structure" comments
- Lazily import a couple of things

* feat(git): Use a single set of public methods for encoding and decoding

We also now encode Path objects as simple `str`s instead of encoding them as their `repr` - this requires a change to test data.

* refactor(git): Rename `ChangedFile.file` -> `ChangedFile.path`

* refactor(git): Make `ChangeSet` a standard Python class rather than `Struct`

* refactor(git): Move `generate_from_diff_output` logic to `ChangeSet`

* refactor(git): Move and rename `Remote.get_commit_and_changes_from_remote`

* feat(git): Expand support to all types of git remotes

* feat(git): Replace diff-related methods with single `Git.get_changes`

This is a lot cleaner and easier to use. Thanks @ofek for the suggestion ! b78a5e1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants