Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

radicle-surf: design document #38

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

radicle-surf: design document #38

wants to merge 4 commits into from

Conversation

FintanH
Copy link
Collaborator

@FintanH FintanH commented Oct 24, 2022

The work on the refactoring document[0] has been extremely useful in pinpointing problem areas in the radicle-surf redesign.

This document intends to supplement that by outlining a high-level API that the redesign can follow. The design specifies a series of components and recommended ways of implementing those components. The exact implementation details are avoided. It also aims to provide more education around different concepts in git by supplying git documentation and explanations of the different types one may come across in git.

Copy link
Contributor

@keepsimple1 keepsimple1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This naturally creates some conflicts with the mentioned refactor design doc and created some confusion for me. I discussed my confusions off this PR. Meanwhile, I provided some comments inline.

`refs/heads/main` branch would mean
`refs/namespaces/rad/refs/heads/main`.

With the above in mind, the following API functions are suggested:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API listed are all free-standing functions that takes in &Storage. Probably worth explaining what &Storage is and why the API is designed using it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, good point! Maybe to reiterate -- and perhaps I didn't explain this well -- the functions are merely sketches and it's the functionality I wanted to focus on. I mention that here.

But I can also mention that Storage is deliberately opaque and merely represents any way of accessing a git repository. To clarify, this attempt doesn't care about the backing storage -- it just cares about the functions (in whatever form) exist to use.


```rust
/// Return a list of references based on the `pattern` string supplied, e.g. `refs/rad/*`
pub fn references(storage: &Storage, pattern: PatterStr) -> Result<References, Error>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This general function overlaps with more specific functions below (e.g. branches) in the sense that the user can use either of them to get the local branches (or other types of references).

Will Reference be an enum of different type of references? Or will it be basically a string? If it's an enum, we can get away without other functions for special reference types. On the other hand, special functions provide better type checking.

Do we have some user inputs on this topic, from our previous experience?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This general function overlaps with more specific functions below (e.g. branches) in the sense that the user can use either of them to get the local branches (or other types of references).

I deliberately added this as a general case! :) This is to leave the API open for upstream users to fetch references under namespaces that they define themselves. The cobs use case is a prime example -- the clients team may want to be able to fetch collaborative objects and they could use radicle-surf tooling for that. Would that be right, @cloudhead?

Will Reference be an enum of different type of references? Or will it be basically a string? If it's an enum, we can get away without other functions for special reference types. On the other hand, special functions provide better type checking.

References would be an iterator over the references that match the pattern. Each reference would be their name and their target -- Oid for direct references and another reference for symbolic references.

The same would go for Branches, but the reason it has another type is that the names are expected to be prefixed with refs/heads. Similar for Tags, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is pretty standard and we wouldn't want to lose this when moving from git2.

pub fn references(storage: &Storage, pattern: PatterStr) -> Result<References, Error>;

/// Return a list of branches based on the `pattern` string supplied, e.g. `refs/heads/features/*`
pub fn branches(storage: &Storage, pattern: BranchPattern) -> Result<Branches, Error>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the BranchPattern parameter? How common would our users like to provide an actual pattern, vs. providing no patterns?

If providing an actual pattern is not common, we could consider to support it via a separate filter function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! I guess we could go either way. If a pattern was taken then we could define a constant:

lazy_static! {
  static ref ALL: BranchPattern = pattern_str!("refs/heads/*");
}

As long as Branches stays an iterator, I think the filter approach after works equally as well! :)

pub fn tags(storage: &Storage, pattern: TagPattern) -> Result<Tags, Error>;

/// Return a list of notes based on the `pattern` string supplied, e.g. `refs/notes/blogs/*`
pub fn notes(storage: &Storage) -> Result<Notes, Error>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me what the result types of the above functions are. I.e. what Branches, Remotes, Tags, Notes would look like.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I probably should have been more clear but as mentioned here, #38 (comment), they would be iterators over references but with new types to signal they start with their correct namespaces.

describe [revisions][git-revisions]. A revision in `git` is a way of
specifying an `Oid`. This can be done in a multitude of ways. One can
also specify a range of `Oid`s (think `git log`). The API will support
taking revisions as parameters where an `Oid` is expected. It will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking whether Revision or Oid should be the basic identifier. And, whether FromRevision or IntoOid should be the trait to define in this context.

Further more, in the refactor design, I was thinking if we could define CommitId (it's an Oid) and not expose Oid in the API. The rationale is that Commit is a concept even high-level user would know & use, but Object is probably not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking whether Revision or Oid should be the basic identifier. And, whether FromRevision or IntoOid should be the trait to define in this context.

So the reason I was thinking FromRevision is useful is that it makes function calls more flexible. For example, say I want a specific Commit, I may know a reference name that points to it, or its full SHA1, or even it's short-SHA1. So, all these function calls would result in the same Commit:

commit(storage, refname!("refs/heads/surf-design"))
commit(storage, "e9e23f49641b6b9a850969e5fb87f301fc10d2b9")
commit(storage, "e9e23f49641b6b9")

Does that make sense?

Further more, in the refactor design, I was thinking if we could define CommitId (it's an Oid) and not expose Oid in the API. The rationale is that Commit is a concept even high-level user would know & use, but Object is probably not.

This is what I was calling Commitish :) This is terminology that I've come across in git docs quite frequently and here's an SO answer that explains it https://stackoverflow.com/a/23303550

So, yes I think the high-level would be exposing a Commitish or something of that kind. Again, I define it as a trait to be more flexible. Allowing Oids, reference names, or whatever can resolve to a Commit. Our main users, the rest of the clients team, are familiar with Oids and Objects but we can hide some of those things too :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought more about this, and I think your approach make sense. I've adopted your suggestions and used traits for the API changes in PR #37 , with small changes to the naming. Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought more about this, and I think your approach make sense. I've adopted your suggestions and used traits for the API changes in PR #37 , with small changes to the naming. Thanks!

Amazing, teamwork makes the dreamwork 😎

Comment on lines +130 to +234
pub fn blob<R: FromRevision>(storage: &Storage, rev: R) -> Result<Blob, Error>;

/// Get the tree found by `oid`.
pub fn tree<R: FromRevision>(storage: &Storage, rev: R) -> Result<Tree, Error>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although many engineers know about blob and tree objects, they are Git internals. For a high-level lib like radicle-surf, I think we should not expose them in the API, at least not initially, unless requested by our users explicitly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair! I still think it would be good to have them available. They may be available through an objects module but not at the top level.

I think for libraries like this, it's better to have the functionality available. Just because we can't imagine the use case doesn't mean it doesn't exist yet 😁

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can start with using file, directory in our API, and only support blob, tree when users request them in the future. (Unless users have already requested ;-) )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file/directory only makes sense for working copies; if this library is going to be used on bare repositories, I don't see how we can get around blob and tree?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would they not be equivalently used, no? I've advised to have these functions available in a module and not expose them at the top-level module.

This is another point for standalone functions @keepsimple1. You can hide them because they're not attached to the struct as methods 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would they not be equivalently used, no? I've advised to have these functions available in a module and not expose them at the top-level module.

Yeah, I mean if it's just a question of naming, and File has all the capabilities of Blob, and Directory has all the capabilities of Tree, then it could work; although I'd say it's a little confusing because it's not an abstraction, just a rename.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, that's fair. I would opt for a non-specialised tree/blob and then a specialised file/directory because they won't just be renamed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file/directory only makes sense for working copies; if this library is going to be used on bare repositories, I don't see how we can get around blob and tree?

GitHub repos are bare repos but they still present "files" and "directories". (They might did some internal extra work for that, but it is an implementation detail).

I guess that "files" and "directories" are more identified by "paths", while "blob" and "tree" are identified by their Oid. We could support both. I just think if I were to use radicle-surf to write an UI I would just use file and directory as much as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it depends on whether we're ok with users of surf to still require direct access to the git2 repository. If that's by design, then we could get away with File/Directory. But if the plan is for users of this library to not need git2 access, then we might need access to these lower level objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not let users of surf to have direct access to git2. I am ok if it turns out that we need to provide users access to lower-level objects via our API.


```rust
/// Get the `Directory` found at `commit`.
pub fn directory<C: Commitish, P: AsRef<Path>>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to a previous comment, I am pondering in the refactor design that if we can define a CommitId and IntoCommitId trait. Then directory will just take a CommitId (not a Commit itself).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I touched on here, https://github.com/radicle-dev/radicle-git/pull/38/files#r1004154254, this can be an Oid for a Commit 😁 Or anything else that resolves to a Commit, e.g. a reference, a reference name, a tag object that points to a commit, a tag name that resolves to a tag object that points to a commit, etc. 😄

type Error;

/// Resolve the revision to its `Oid`, if possible.
fn peel(&self, storage: &Storage) -> Result<Oid, Self::Error>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this be used? If I understand correctly, a revision takes a multitude of forms.

So we would have a Commit, Tree, etc. struct and then run the_commit.peel(&storage)? But at that point, we cannot have the object yet (because we don't know which commit we are talking about). Would this make more sense as an associated function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. So below we have a commit function -- as well as others, but I think commit will be the most commonly used. It takes a FromRevision. So let's see some examples of types that impl FromRevision:

impl FromRevision for Oid { /* code */ }

/// &str can be parsed into an `Oid`.
impl FromRevision for &str { /* code */ }

/// A reference can be looked up in the repository and its `Oid` is returned
impl FromRevision for RefString { /* code */ }

/// [ref]@{date}
/// https://git-scm.com/docs/gitrevisions#Documentation/gitrevisions.txt-emem
pub struct RefDate { name: Option<RefString>, date: Date }

impl FromRevision for RefDate { /* code */ }

/// [ref]@{n}
/// https://git-scm.com/docs/gitrevisions#Documentation/gitrevisions.txt-emltrefnamegtltngtemegemmaster1em
pub struct RefOrdinal { name: Option<RefString>, ordinal: usize }

impl FromRevision for RefOrdinal { /* code */ }

Now we can use any of these in commit:

commit(storage, refname!("refs/heads/main"))
commit(storage, "e9e23f49641b6b9a850969e5fb87f301fc10d2b9")
commit(storage, "main@{n}".parse::<RefOrdinal>().unwrap())

Is that clear? Would adding this to the document help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I understood Self in this case was a Commit, Tree, etc. that had an Oid. But the idea is to implement the trait for everything that may express an Oid, not for the entities that are identified by one.

Yeah, then it makes sense. I'd add those examples so that it's clear, without having to write prose about "representing an Oid instead of being an object that is represented by an Oid."

But from the examples, would it make sense to equate this trait to TryInto/Into? I don't have a strong opinion on having a specific trait vs. TryInto, so just leaving this here for thought since I think we're going to have a lot of traits flying around 😅 .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I understood Self in this case was a Commit, Tree, etc. that had an Oid. But the idea is to implement the trait for everything that may express an Oid, not for the entities that are identified by one.

Aye, hence my whole spiel about git revisions and linking to the documentation 😄

Yeah, then it makes sense. I'd add those examples so that it's clear, without having to write prose about "representing an Oid instead of being an object that is represented by an Oid."

Makes sense, can do!

But from the examples, would it make sense to equate this trait to TryInto/Into? I don't have a strong opinion on having a specific trait vs. TryInto, so just leaving this here for thought since I think we're going to have a lot of traits flying around

While that would be nice, all the types would have to be augmented with the backing store so that Oids could be looked up. Having a new trait allows us to say that storage is a parameter.

With that being more clear now, what's your opinion on this formulation? Useful or not useful? Would you prefer/see a different approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is definitely useful - I'll walk through a few scenarios (listing the repo, see the history of a file, etc.) to ensure we're not missing anything but I think this formulation makes sense.

pub fn diff<C: Commitish>(old: Old<C>, new: New<C>) -> Result<Diff, Error>
```

## Conclusion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a missing point is a high-level overview of how this API would be used for some common tasks:

  • Show all files and directories in a repository
  • Show the repo history
  • Show the history of a file
  • Show a file's content for a specific commit (I assume this is part of the objectives of the API)

For example:

Show all files and directories at root of main branch of repository:

let rev = Revision::from_str("refs/heads/main");
let root_commit = Commit::from_rev(rev);
let root_dir = directory(&storage, root_commit);

// process root_dir iterator

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, are you saying an examples section would be useful? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's my long-winded way of saying that :D

Like "examples of how we think this API would be used". I can write that tomorrow as a way of giving this a second review.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahaha, I just wanted to clarify 🙏

provided.

The general mechanism for looking at a history of commits is called a
[`revwalk`][libgit-revwal] in most `git` libraries. This provides a
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[`revwalk`][libgit-revwal] in most `git` libraries. This provides a
[`revwalk`][libgit-revwalk] in most `git` libraries. This provides a

///
/// However, some other revisions require parsing and/or looking at the
/// storage, which may result in an `Error`.
pub trait FromRevision {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From* traits usually have a function that returns Self, but here we return an Oid. Wouldn't it make more sense to call this simply Revision? A Revision is any type that can be peeled to an Oid..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I originally thought FromRevision because it's going from a revision to an Oid, but Revision works for me too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, yeah I think it would work better, because FromRevision implies that the implementor of the trait isn't a Revision, since it's being converted "from a revision"; but in this case, all trait implementors are valid revisions.


/// Resolve the type to its `Commit`, if possible.
fn peel(&self, storage: &Storage) -> Result<Commit, Self::Error>;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between a revision and a commitish?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that a revision can be used to refer to any type of object whereas commitish implies that whatever you pass needs to resolve to a Commit. Technically, revision implies commitish so you could have:

/// The `Revision` *must* resolve to a `Commit`.
impl<R: Revision> Commitish for R {
  type Error = Error;
  
  fn peel(&self, storage: &Storage) -> Result<Commit, Self::Error> {
    let oid = Revision::peel(self)?;
    /* try resolve it to a Commit, i.e. find_commit, find_tag -> commit, etc.*/
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I thought that's what a revision was. See https://git-scm.com/docs/git-rev-parse#_specifying_revisions

I think if you're trying to retrieve a blob for example, you wouldn't expect to pass a revision, but rather an oid directly, since named refs typically don't point to blobs, only commits.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A revision parameter typically, but not necessarily, names a commit object.

but not necessarily :)

I think if you're trying to retrieve a blob for example, you wouldn't expect to pass a revision, but rather an oid directly, since named refs typically don't point to blobs, only commits.

Ya, typically. But there's also revisions that point to tags :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, typically. But there's also revisions that point to tags :)

But tags resolve to commits in the end.. I guess I'd be curious to see examples when you'd accept a Commitish as a function argument, and accepting a Revision wouldn't work, or something like that, which would be a motivating example.

The work on the refactoring document[0] has been extremely useful in
pinpointing problem areas in the radicle-surf redesign.

This document intends to supplement that by outlining a high-level API
that the redesign can follow. The design specifies a series of
components and recommended ways of implementing those components. The
exact implementation details are avoided. It also aims to provide more
education around different concepts in git by supplying git
documentation and explanations of the different types one may come
across in git.

[0]: https://github.com/radicle-dev/radicle-git/blob/main/radicle-surf/docs/refactor-design.md

Signed-off-by: Fintan Halpenny <fintan.halpenny@gmail.com>
Signed-off-by: Fintan Halpenny <fintan.halpenny@gmail.com>
X-Clacks-Overhead: GNU Terry Pratchett
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants