Skip to content

Commit b78a5e1

Browse files
authored
[ACIX-1060] Add classes for handling git objects: Commit, ChangeSet... (#184)
* Port src/ changes from #181 * Port `tests/` changes from #181 * fix(git): Fix tests - move fixtures and use proper author details refactor(tests): Move fixtures to `git` folder feat(git): Add `commit_files` method replacing `tests.helpers.commit_file` + test changes fix(tests): Use `default_git_author` fixture in tests * feat(git): Add `Commit.enc_hook` and `Commit.dec_hook` * refactor(git): Make `Commit` objects frozen We can actually do this quite easily by avoiding the use of `_details` and `_changes` fields, and instead using `__dict__` to store the details and changes. This is the same technique that `cached_property` uses internally, so trying to access the cached property works as expected if the details or changes have been set by one of the `get_details_from_*` or `get_changes_from_*` methods. * refactor(git): Remove `commit_files` method * refactor(git): Minor fixes * refactor(git): Rename `FileChanges` to `ChangedFile` * refactor(git): Move everything related to GitHub to own file * refactor(git): Replace `__new__` override with explicit constructor * fix(git): Check for failures when calling `git diff` * refactor(git): Extract `add` and `commit` into individual methods * feat(git): Add `binary` field to `FileChanges` objects * refactor(git): Use regex-based parsing method for `generate_from_diff_output` * fix(git): Only validate length in `Commit` class * refactor(git): Simplify `Commit` class Spurred on by [this review comment](#184 (comment)). Not having access to `Application` here means the `get_changes_from_git` and `get_details_from_git` are not possible to implement. This then causes a cascade of things: - Only having one source for `details` and `changes` means we can just use standard `@cached_properties` - However, we don't want to favorize people using the Github API instead of calls to git as it is less reliable. - To prevent this, the easiest is just to remove the `details` and `changes` properties altogether. Callers can still access these by using the forward-relationship methods `Git.get_commit_details` or `Remote.get_commit_details_and_changes` - To keep a consistent style and avoid confusing people, we remove the other reverse-relationship methods (`Commit.head` - `Git.get_head_commit` can be used instead. Same for `compare_to`) - We can then remove all the proxying methods. - Since we don't need to encode `ChangeSet`s, the `enc_hooks` are not needed anymore, `Commit`s are natively encodable + Remove extraneous tests and fix remaining ones * refactor(git): Make `ChangeSet.changes` public and immutable Unfortunately `msgspec` does not natively support encoding/decoding `MappingProxyType` objects, so we need to implement custom logic for that: jcrist/msgspec#632 * feat(git): Use SHA256 for `ChangeSet.digest` * refactor(git): Rename `Git.get_remote_details` to `Git.get_remote` * refactor(git): Tweak implementation of `CommitDetails` - Store both author and commiter details - Store these as `tuple`s - Store the commit date as a UNIX timestamp and only convert to `datetime` when needed - Tweak call to `git show` format + use `NUL` characters for splitting - Remove `parent_shas` list * refactor(git): Use `capture` instead of `run` * refactor(git): Merge the `CommitDetails` class into Commit This makes instantiation a bit clunkier (as can be seen in tests), but directly instantiating a `Commit` will be rare, more often getting it from a `git` command or GitHub. * refactor(git): Pack author and committer details into a `GitPersonDetails` struct * fix(git): Fix recurring typo `commiter` -> `committer` * refactor(git): Minor tweaks - Remove "file structure" comments - Lazily import a couple of things * feat(git): Use a single set of public methods for encoding and decoding We also now encode Path objects as simple `str`s instead of encoding them as their `repr` - this requires a change to test data. * refactor(git): Rename `ChangedFile.file` -> `ChangedFile.path` * refactor(git): Make `ChangeSet` a standard Python class rather than `Struct` * refactor(git): Move `generate_from_diff_output` logic to `ChangeSet` * refactor(git): Move and rename `Remote.get_commit_and_changes_from_remote` * feat(git): Expand support to all types of git remotes * feat(git): Replace diff-related methods with single `Git.get_changes` This is a lot cleaner and easier to use. Thanks @ofek for the suggestion !
1 parent 432b1f8 commit b78a5e1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+2711
-22
lines changed

src/dda/config/model/__init__.py

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ class RootConfig(Struct, frozen=True, omit_defaults=True):
4040

4141

4242
def construct_model(data: dict[str, Any]) -> RootConfig:
43-
return convert(data, RootConfig, dec_hook=__dec_hook)
43+
return convert(data, RootConfig, dec_hook=dec_hook)
4444

4545

4646
def get_default_toml_data() -> dict[str, Any]:
@@ -50,21 +50,40 @@ def get_default_toml_data() -> dict[str, Any]:
5050
RootConfig(),
5151
str_keys=True,
5252
builtin_types=(datetime.datetime, datetime.date, datetime.time),
53-
enc_hook=__enc_hook,
53+
enc_hook=enc_hook,
5454
)
5555

5656

57-
def __dec_hook(type: type[Any], obj: Any) -> Any: # noqa: A002
57+
def dec_hook(type: type[Any], obj: Any) -> Any: # noqa: A002
5858
if type is Path:
5959
return Path(obj)
6060

61+
from msgspec import convert
62+
63+
from dda.utils.git.changeset import ChangedFile, ChangeSet
64+
65+
if type is ChangeSet:
66+
# Since the dict decode logic from msgspec is not called here we have to manually decode the keys and values
67+
decoded_obj = {}
68+
for key, value in obj.items():
69+
decoded_key = dec_hook(Path, key)
70+
decoded_value = convert(value, ChangedFile, dec_hook=dec_hook)
71+
decoded_obj[decoded_key] = decoded_value
72+
return ChangeSet(changes=decoded_obj)
73+
6174
message = f"Cannot decode: {obj!r}"
6275
raise ValueError(message)
6376

6477

65-
def __enc_hook(obj: Any) -> Any:
78+
def enc_hook(obj: Any) -> Any:
6679
if isinstance(obj, Path):
6780
return str(obj)
6881

82+
from dda.utils.git.changeset import ChangeSet
83+
84+
# Encode ChangeSet objects as dicts
85+
if isinstance(obj, ChangeSet):
86+
return dict(obj.changes)
87+
6988
message = f"Cannot encode: {obj!r}"
7089
raise NotImplementedError(message)

src/dda/tools/git.py

Lines changed: 192 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,18 @@
55

66
from contextlib import contextmanager
77
from functools import cached_property
8-
from typing import TYPE_CHECKING
8+
from typing import TYPE_CHECKING, Any
99

1010
from dda.tools.base import ExecutionContext, Tool
11-
from dda.utils.git.constants import GitEnvVars
11+
from dda.utils.git.changeset import ChangeSet
12+
from dda.utils.git.commit import GitPersonDetails
1213

1314
if TYPE_CHECKING:
14-
from collections.abc import Generator
15+
from collections.abc import Generator, Iterable
16+
17+
from dda.utils.fs import Path
18+
from dda.utils.git.commit import Commit
19+
from dda.utils.git.remote import Remote
1520

1621

1722
class Git(Tool):
@@ -25,6 +30,8 @@ class Git(Tool):
2530

2631
@contextmanager
2732
def execution_context(self, command: list[str]) -> Generator[ExecutionContext, None, None]:
33+
from dda.utils.git.constants import GitEnvVars
34+
2835
author_name = self.app.config.tools.git.author.name.strip()
2936
author_email = self.app.config.tools.git.author.email.strip()
3037
env_vars = {}
@@ -51,6 +58,8 @@ def author_name(self) -> str:
5158
"""
5259
from os import environ
5360

61+
from dda.utils.git.constants import GitEnvVars
62+
5463
if env_username := environ.get(GitEnvVars.AUTHOR_NAME):
5564
return env_username
5665

@@ -66,7 +75,187 @@ def author_email(self) -> str:
6675
"""
6776
from os import environ
6877

78+
from dda.utils.git.constants import GitEnvVars
79+
6980
if env_email := environ.get(GitEnvVars.AUTHOR_EMAIL):
7081
return env_email
7182

7283
return self.capture(["config", "--get", "user.email"]).strip()
84+
85+
def get_remote(self, remote_name: str = "origin") -> Remote:
86+
"""
87+
Get the details of the given remote for the Git repository in the current working directory.
88+
"""
89+
from dda.utils.git.remote import Remote
90+
91+
remote_url = self.capture(
92+
["config", "--get", f"remote.{remote_name}.url"],
93+
).strip()
94+
95+
return Remote.from_url(remote_url)
96+
97+
def get_commit(self, ref: str = "HEAD") -> Commit:
98+
"""
99+
Get a Commit object from the Git repository in the current working directory for the given reference (default: HEAD).
100+
"""
101+
from dda.utils.git.commit import Commit
102+
103+
raw_details = self.capture([
104+
"--no-pager",
105+
"show",
106+
"--no-color",
107+
"--no-patch",
108+
"--quiet",
109+
# Use a format that is easy to parse
110+
# fmt: commit hash, commit subject, commit body, committer name, committer email, committer date, author name, author email, author date
111+
"--format=%H%x00%s%x00%b%x00%cn%x00%ce%x00%ct%x00%an%x00%ae%x00%at",
112+
f"{ref}^{{commit}}",
113+
])
114+
115+
# Extract parts
116+
parts = raw_details.split("\0")
117+
sha1, *parts = parts
118+
commit_subject, commit_body, *parts = parts
119+
committer_name, committer_email, commit_date, *parts = parts
120+
author_name, author_email, author_date = parts
121+
122+
# Process parts
123+
124+
# Will give a timestamp in UTC
125+
# we don't care what tz the author was in as long as we stay consistent and are able to display all times in the user's timezone
126+
commit_timestamp_str = commit_date.split(" ")[0]
127+
commit_timestamp = int(commit_timestamp_str)
128+
author_timestamp_str = author_date.split(" ")[0]
129+
author_timestamp = int(author_timestamp_str)
130+
message = (commit_subject + "\n\n" + commit_body).strip()
131+
132+
author_details = GitPersonDetails(author_name, author_email, author_timestamp)
133+
committer_details = GitPersonDetails(committer_name, committer_email, commit_timestamp)
134+
135+
return Commit(
136+
sha1=sha1,
137+
author=author_details,
138+
committer=committer_details,
139+
message=message,
140+
)
141+
142+
def add(self, paths: Iterable[Path]) -> None:
143+
"""
144+
Add the given paths to the index.
145+
Will fail if any path is not under cwd.
146+
"""
147+
self.capture(["add", *map(str, paths)])
148+
149+
def commit(self, message: str, *, allow_empty: bool = False) -> None:
150+
"""
151+
Commit the changes in the index.
152+
"""
153+
args = ["commit", "-m", message]
154+
if allow_empty:
155+
args.append("--allow-empty")
156+
self.capture(args)
157+
158+
def commit_file(self, path: Path, *, content: str, commit_message: str) -> None:
159+
"""
160+
Create and commit a single file with the given content.
161+
"""
162+
path.write_text(content)
163+
self.add((path,))
164+
self.commit(commit_message)
165+
166+
def _get_patch(self, *args: str, **kwargs: Any) -> str:
167+
diff_args = [
168+
"-c",
169+
"core.quotepath=false",
170+
"diff",
171+
"-U0",
172+
"--no-color",
173+
"--no-prefix",
174+
"--no-renames",
175+
"--no-ext-diff",
176+
]
177+
return self.capture([*diff_args, *args], **kwargs).strip()
178+
179+
def get_changes(
180+
self,
181+
ref: str = "HEAD",
182+
/,
183+
*,
184+
start: str | None = None,
185+
merge_base: bool = False,
186+
working_tree: bool = False,
187+
remote_name: str = "origin",
188+
) -> ChangeSet:
189+
"""
190+
Use `git` to compute a ChangeSet between two refs.
191+
192+
Parameters:
193+
ref: The reference to compare to. Default is HEAD.
194+
start: The reference to compare from.
195+
Default is None, in which case the parent commit of the ref is used.
196+
If merge_base is also True, the merge base is used instead of the parent commit.
197+
merge_base: Whether to compute the differences between the refs starting from the merge base.
198+
working_tree: Whether to include the working tree changes. Default is False.
199+
remote_name: The name of the remote to compare to. Default is origin.
200+
201+
Returns:
202+
A ChangeSet representing the differences between the refs.
203+
204+
Examples:
205+
```python
206+
# Get the changes of the HEAD commit
207+
changes = git.get_changes(ref="HEAD")
208+
209+
# Get the changes between two commits
210+
changes = git.get_changes(ref=commit1.sha1, start=commit2.sha1)
211+
212+
# Get the changes between the HEAD commit and the main branch
213+
changes = git.get_changes(ref="HEAD", start="origin/main")
214+
215+
# Get the changes between the HEAD commit and the main branch starting from the merge base
216+
changes = git.get_changes(ref="HEAD", start="origin/main", merge_base=True)
217+
218+
# Get _only_ the working tree changes
219+
changes = git.get_changes(ref="HEAD", start="HEAD", working_tree=True)
220+
```
221+
"""
222+
if start is None:
223+
revspec = f"{remote_name}...{ref}" if merge_base else f"{ref}^!"
224+
elif merge_base:
225+
revspec = f"{start}...{ref}"
226+
else:
227+
revspec = f"{start}..{ref}"
228+
229+
patches = [self._get_patch(revspec)]
230+
231+
if working_tree:
232+
patches.append(self._get_working_tree_patch())
233+
234+
# Filter out any empty patches
235+
patches = [patch.strip() for patch in patches if patch.strip()]
236+
237+
return ChangeSet.generate_from_diff_output(patches)
238+
239+
def _get_working_tree_patch(self) -> str:
240+
from os import environ
241+
242+
from dda.utils.fs import temp_file
243+
244+
with temp_file(suffix=".git_index") as temp_index_path:
245+
# Set up environment with temporary index
246+
temp_env = dict(environ)
247+
temp_env["GIT_INDEX_FILE"] = str(temp_index_path.resolve())
248+
self.run(["read-tree", "HEAD"], env=temp_env)
249+
250+
# Get list of untracked files
251+
untracked_files_output = self.capture(
252+
["ls-files", "--others", "--exclude-standard", "-z"], env=temp_env
253+
).strip()
254+
untracked_files = [x.strip() for x in untracked_files_output.split("\0") if x.strip()]
255+
256+
# Add untracked files to the index with --intent-to-add
257+
if untracked_files:
258+
self.run(["add", "--intent-to-add", *untracked_files], env=temp_env)
259+
260+
# Get all changes (tracked + untracked) with a single diff command
261+
return self._get_patch("HEAD", env=temp_env)

src/dda/utils/fs.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,3 +170,22 @@ def temp_directory() -> Generator[Path, None, None]:
170170

171171
with TemporaryDirectory() as d:
172172
yield Path(d).resolve()
173+
174+
175+
@contextmanager
176+
def temp_file(suffix: str = "") -> Generator[Path, None, None]:
177+
"""
178+
A context manager that creates a temporary file and yields a path to it. Example:
179+
180+
```python
181+
with temp_file() as temp_file:
182+
...
183+
```
184+
185+
Yields:
186+
The resolved path to the temporary file, following all symlinks.
187+
"""
188+
from tempfile import NamedTemporaryFile
189+
190+
with NamedTemporaryFile(suffix=suffix) as f:
191+
yield Path(f.name).resolve()

0 commit comments

Comments
 (0)