Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

radicle-surf: refactor design: part 4 #37

Merged
merged 1 commit into from
Nov 1, 2022

Conversation

keepsimple1
Copy link
Contributor

@keepsimple1 keepsimple1 commented Oct 21, 2022

Continued work for #27, part 4:

Copy link
Collaborator

@FintanH FintanH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm liking the additions :)

radicle-surf/docs/refactor-design.md Outdated Show resolved Hide resolved
@keepsimple1 keepsimple1 force-pushed the new-design-4 branch 2 times, most recently from b2071eb to 7864633 Compare October 25, 2022 05:47
@keepsimple1 keepsimple1 changed the title WIP: radicle-surf: refactor design: part 4 radicle-surf: refactor design: part 4 Oct 25, 2022
pub fn initial_diff(&self, rev: &Rev) -> Result<Diff, Error> {

/// Returns the diff of a specific revision.
pub fn diff_of_rev(&self, rev: &Rev) -> Result<Diff, Error>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this useful at all?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me, is this the one that diffs the rev against its immediate parent? I think that case is useful because it's what's displayed if you navigate to a single commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're correct.

Copy link
Contributor

@cloudhead cloudhead Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would call it something more specific though, they are all "diffs of revs".. maybe parent_diff or diff_from_parent

In this design, `History` does not keep `RevWalk` in its state. It will create
a new one when `iter()` is called. I like the immutable interface of this design
but did not implement it in the current code mainly because the libgit2 doc says
[creating a new `RevWalk` is relatively expensive](https://libgit2.org/libgit2/#HEAD/group/revwalk/git_revwalk_new).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a clickable link for the libgit2 API doc . If people are interested in this alternative design, I can do some measurements to see how "expensive" git_revwalk_new actually is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a super interesting find! Not only is it expensive it's also not thread-safe to share 😱

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add, the "not thread-safe" part would make the alternative design a bit more attractive.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, fun fact! The repository can't be shared across threads either 🙃 In Rust this is indicated by it either having a Sync impl or not which in this case it's !Sync https://docs.rs/git2/0.15.0/git2/struct.Repository.html#impl-Sync-for-Repository.

Having the Revwalk inside means that we don't need to create a new one every time, but rather can reconfigure it for each walk. It also seems to reset after iteration completes 🤔 It might be worth playing around with scenarios of reusing the same History and see what happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmt::Debug for History might be able to use reset and push head again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmt::Debug for History might be able to use reset and push head again.

I don't understand, can you elaborate :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I was thinking that we could iterator the history to print out the commit list in fmt::Debug but then realized it does not take &mut self.

let repo = self.repo.clone();
self.filter_map(
move |commit| match repo.repo_ref.find_commit(commit.id.into()) {
Ok(git2_commit) => match repo.diff_commit_and_parents(&path, &git2_commit) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work with binary files due to this line, unfortunately.

Copy link
Contributor Author

@keepsimple1 keepsimple1 Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this only affects a flag, but the binary file is still found. I am adding a test case for this.


/// Returns an iterator that only iterates commits for a file specified by
/// `path`, in this history.
pub fn has_file(self, path: file_system::Path) -> impl Iterator<Item = Commit> + 'a {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should be Result<Commit, Error> so that errors aren't filtered out, but instead returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to have the Item type consistent as the History iterator item type, i.e. Commit. I agree that this is a trade-off. If we do want to bubble up the potential errors, I think we should change History iterator returning Result<> as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the iterator only returned Commit before it was safe to do so. Don't be afraid to change the code in these sort of cases as semantics change :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this method to return History as well.

radicle-surf/src/vcs/git/repo.rs Outdated Show resolved Hide resolved
.id;
assert_eq!(root_last_commit_id, Some(expected_oid));
}

#[test]
fn binary_file() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this to check if we can find binary files in a history. Also opened an issue #40 that will look into the binary flag as well.

fn peel(&self, repo: &RepositoryRef) -> Result<Commit, Error>;
}

impl IntoCommit for &Commit {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make the clone happen by the caller, not the trait.

Suggested change
impl IntoCommit for &Commit {
impl IntoCommit for Commit {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.

pub fn diff(&self, from: &Rev, to: &Rev) -> Result<Diff, Error> {
let from_commit = self.rev_to_commit(from)?;
let to_commit = self.rev_to_commit(to)?;
pub fn diff<R: FromRevision>(&self, from: R, to: R) -> Result<Diff, Error> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're turning these into Commits so this should use the IntoCommit trait :) The phrase "statement of intent" comes to mind. Since revisions can be used to refer to any objects they're too general, but IntoCommit tells us that we're definitely going through Commit objects.

Suggested change
pub fn diff<R: FromRevision>(&self, from: R, to: R) -> Result<Diff, Error> {
pub fn diff<R: IntoCommit>(&self, from: R, to: R) -> Result<Diff, Error> {

Copy link
Contributor Author

@keepsimple1 keepsimple1 Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to add a comment, but the reason I used FromRevision is that git-diff supports other revs other than commits, like two files (blobs).

Are we sure this diff will only support commits going forward? (I know it's hard to predict the future, but I was thinking maybe it's beneficial to use FromRevision here).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's a fair point! I missed this function diff_blobs.

Are we sure this diff will only support commits going forward? (I know it's hard to predict the future, but I was thinking maybe it's beneficial to use FromRevision here).

I'm happy to go forward with FromRevision! Good catch :)

/// Supports various ways to specify a revision used in Git.
pub trait FromRevision {
/// Resolves a revision into an object id in `repo`.
fn object_id(&self, repo: &RepositoryRef) -> Result<Oid, Error>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary but I'd probably call this id. There's no other types of ids lying around :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do feel object_id has more advantages than id in this case: ;-)

object_id in Git is well defined, but id is less so. And the return type is Oid, not Id.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concede 🙇

Comment on lines 165 to 170
let git2_oid = repo.repo_ref.refname_to_id(&self.refname())?;
Ok(git2_oid.into())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let git2_oid = repo.repo_ref.refname_to_id(&self.refname())?;
Ok(git2_oid.into())
Ok(repo.repo_ref.refname_to_id(&self.refname()).map(Oid::from)?)

Comment on lines 172 to 178
let git2_oid = repo.repo_ref.refname_to_id(&self.refname())?;
Ok(git2_oid.into())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let git2_oid = repo.repo_ref.refname_to_id(&self.refname())?;
Ok(git2_oid.into())
Ok(repo.repo_ref.refname_to_id(&self.refname()).map(Oid::from)?)

/// Supports various ways to specify a commit in Git.
pub trait IntoCommit {
/// Resolves into a commit in `repo`.
fn peel(&self, repo: &RepositoryRef) -> Result<Commit, Error>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just realising that this isn't a git2::Commit. I think for getting the most out of this trait it should return git2::Commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of goals I listed is to hide git2 in our new API. I hope we can reach that goal.


/// Returns the full ref name of the tag.
pub fn refname(&self) -> String {
format!("refs/tags/{}", self.name().name())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we should start using the git-ref-format crate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look into this when starting on #15 .

@@ -143,18 +143,80 @@ The core API is:

```Rust
imp RepositoryRef {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that you have a few imp typos which should be impl FYI :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

/// in the `repo`.
///
/// As we have the `head`, this history is guaranteed not empty.
pub struct History<'a> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love it if we put this in its own module :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will work on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

let repo = self.repo.clone();
self.filter_map(
move |commit| match repo.repo_ref.find_commit(commit.id.into()) {
Ok(git2_commit) => match repo.diff_commit_and_parents(&path, &git2_commit) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is the right way of going about it checking if the file exists in the history. On the one hand, I was like "clever :)", on the other it seemed like a bit of indirection. I'd be curious as to what the difference is between calculating a diff vs just getting a tree and seeing if the entry is in there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that we wanted to list commits for the file only if there is a change to the file. If that is not the case, then we can do it differently here.


/// Returns an iterator that only iterates commits for a file specified by
/// `path`, in this history.
pub fn has_file(self, path: file_system::Path) -> impl Iterator<Item = Commit> + 'a {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if returning the File and the Commit would be useful here. Imagining I have navigated to the file history page -- would it be better if the click on the commit and then had to request for the blob/file or would it be better if the data was already there ready to serve? I think the latter but that would also mean it will take up more memory :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the file history starts with a list of commits, and only when the user clicks / expands a commit we will show the actual file, then I think we can do the lazy loading.

If we preload File as well for every commit, in addition to the extra memory, the history loading time can potentially be long for two reasons: 1) a file can be big 2) the history can be a long one with many commits.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we preload File as well for every commit, in addition to the extra memory, the history loading time can potentially be long for two reasons: 1) a file can be big 2) the history can be a long one with many commits.

Well since iterators are lazy, I don't think either of these points is a problem -- unless they get collected into a structure.

If the file history starts with a list of commits, and only when the user clicks / expands a commit we will show the actual file, then I think we can do the lazy loading.

Based on this though, it just gives a list of commits. Annoyingly though, if you click into the commit it gives you all the files, which sucks if you're just focusing on a single file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the next comment, if you click the icon next to the commit link, then it will give you only the single file.

type Item = Commit;

fn next(&mut self) -> Option<Self::Item> {
// If we encounter any errors, will return None.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this, tbh. This will lead to silent errors and will make our lives difficult trying to debug when things go wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will work on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed to return a Result in Item.


/// Returns an iterator that only iterates commits for a file specified by
/// `path`, in this history.
pub fn has_file(self, path: file_system::Path) -> impl Iterator<Item = Commit> + 'a {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is there a reason you didn't want to follow suit on a similar formulation of my sketch here https://github.com/radicle-dev/radicle-git/pull/39/files#diff-72117ea7d2958719f587571649f74bbdbdac6c8a54cad6d3ca0247189a85aefcR5? Looking to understand what you like or don't like about the approach :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, I like the optional FilterBy and I will try to see if can pass it in a modify function to a History. I didn't like BlobAt as I don't think History needs to yield an actual File (or Blob) object. I also think maybe we can modify a History to get a history for a file, instead of a constructor History::file().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, I like the optional FilterBy and I will try to see if can pass it in a modify function to a History.

Ha! The FilterBy was what I didn't like about that approach 😄 It felt like a code smell waiting to happen as more filters would be added and wouldn't be used.

I didn't like BlobAt as I don't think History needs to yield an actual File (or Blob) object.

Understood. What about returning the Oid so that the object can be retrieved easily?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated to use FilterBy internally but not ask for them in new().

Understood. What about returning the Oid so that the object can be retrieved easily?

I didn't get chance to work on this yet. I will play around with by_path() method again.

Copy link
Contributor Author

@keepsimple1 keepsimple1 Oct 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are good with this as by_path will return a History of commits for this file. When iterating those commits, if the user wanted to open the file at that point, we can retrieve the object on-demand using the commit's Oid (using git2 tree's get_path as in your diff). I.e. we don't have to pre-load and store the file's Oid earlier. I think this is similar with what GitHub does for a file history:

Screenshot_20221031_102413

And the icon of opening the file at one point also links to a path under the commit Oid:
Screenshot_20221031_102621

link: https://github.com/radicle-dev/radicle-git/blob/c3d9a2bdc511dbf9a78b6c5fc1b3ac6640595e56/radicle-surf/src/file_system/directory.rs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added get_file() method for Commit, with a test case showing its usage. I think there is still room to iterate it. One thing is: should we add a repo reference in Commit so that its method calls can be more concise? I am leave it as is for this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing is: should we add a repo reference in Commit so that its method calls can be more concise? I am leave it as is for this PR.

Mmmm that sounds like something I'd like to avoid. Carrying around the repo hidden in these objects comes across as a smell to me. You're hiding away the state rather than explicitly showing what functions do.

pub fn head(&self) -> &Commit;

/// Returns an iterator that only iterates commits that has the file `path`.
pub fn has_file(self, path: file_system::Path) -> impl Iterator<Item = Commit> + 'a;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I would call this by_path or filter_by_path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

pub fn has_file(self, path: file_system::Path) -> impl Iterator<Item = Commit> + 'a;

/// Reloads the history so that a caller can iterate it again.
pub fn reload(&mut self);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not typical of rust iterators, normally they are consumed by the iteration. I'm not sure why we would need to re-iterate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using it when working on the patch. I am not sure if our users will need it. I can remove it for now.

@@ -66,6 +113,7 @@ pub struct Repository(pub(super) git2::Repository);
///
/// Use the `From<&'a git2::Repository>` implementation to construct a
/// `RepositoryRef`.
#[derive(Clone)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy would also make sense for a reference type.

@@ -138,3 +141,35 @@ where
})
}
}

/// Supports various ways to specify a revision used in Git.
pub trait FromRevision {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should be called ToOid, based on the method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this update, I was debating between FromRevision and ToOid (or IntoOid). Eventually I like the fact that FromRevision has "Revision" in it and "Revision" is an official term in git, which is what we like to support via this trait.

ToOid does not (explicitly) tell what kind of types we like to support, even though we could explain them in doc comments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, just to clarify a bit more -- I formalised this here https://github.com/radicle-dev/radicle-git/pull/38/files#diff-8fa8f2eb0bba44b7eb0f50532935cf0db8569f5441672e5acb7d9d91401887baR82-R109.

The idea is to resolve a git revision into its corresponding Oid.

@@ -166,3 +166,39 @@ impl<'repo> TryFrom<git2::Commit<'repo>> for Commit {
})
}
}

/// Supports various ways to specify a commit in Git.
pub trait IntoCommit {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what this trait does, it would make more sense to call it ToCommit.

See: https://rust-lang.github.io/api-guidelines/naming.html#ad-hoc-conversions-follow-as_-to_-into_-conventions-c-conv

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that for some types like &str, ToCommit makes more sense. For some other types like Ref, IntoCommit feels more natural.

If we name it as ToCommit, then maybe the method can be renamed to to_commit.

@keepsimple1 keepsimple1 force-pushed the new-design-4 branch 3 times, most recently from 9827955 to 81f9a69 Compare October 28, 2022 06:36
Copy link
Collaborator

@FintanH FintanH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me :)

If you don't mind, I'll take a pass at the document this week to consolidate it with the design document I wrote. So the final product will give a high-level view as well as the implementation details you're providing in the redesign document.

.history(&rev)?
.filter_map(|c| match c {
Ok(c) => Some(Header::from(c)),
Err(_) => None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why choose to not return these errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I just learned that Collect can do some magic with Result. Will update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -138,3 +144,54 @@ where
})
}
}

/// Supports various ways to specify a revision used in Git.
pub trait FromRevision {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say we should just call this Revision to avoid any confusion around the From

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. In the next PR, I hope I can remove the other Rev type, otherwise it can also be confusing.

- Update design doc.
- Refactor History API.
- Use FromRevision and IntoCommit traits.
- Add more test cases.

Signed-off-by: Han Xu <keepsimple@gmail.com>
but did not implement it in the current code mainly because the libgit2 doc says
[creating a new `RevWalk` is relatively expensive](https://libgit2.org/libgit2/#HEAD/group/revwalk/git_revwalk_new).

### List refs and retrieve their metadata
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure about the below but we can follow up. I think it'd be better to think about these once you're more familiar with git-ref-format.

@FintanH FintanH merged commit 7a82c40 into radicle-dev:main Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants