Skip to content

Dependency cli parsing #673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

Dependency cli parsing #673

wants to merge 12 commits into from

Conversation

bcardiff
Copy link
Member

@bcardiff bcardiff commented May 16, 2025

Related to #672, this PR adds the logic to parse dependencies from the cli. Still not used, but worth discussing syntax and implementation.

The Dependency class unfortunately is not fit to generate the shard.yaml. Resolver key and source are normalized. I created a separate type.

The added spec shows the various syntax supported.

To create a proper dependency, with the shard name, we need to actually fetch the shard to resolve that. This is done in DependencyDefinition.from_cli while the DependencyDefinition.parts_from_cli is just the parsing without side effects.

In general {resolver}:{source} syntax is supported, but there are some shorthands for convenience.

  • git@...
  • github:foo/bar
  • github:Foo/Bar#1.2.3 (which will mean = 1.2.3)
  • https://github.com/foo/bar and other known hosts like gitlab, bitbucket, and codeberg
  • https://github.com/foo/bar#1.2.3 and other known hosts like gitlab, bitbucket, and codeberg
  • https://github.com/Foo/Bar/commit/000000
  • https://github.com/Foo/Bar/tree/v1.2.3 (which is interpreted as a tag since it's prefixed with v)
  • https://github.com/Foo/Bar/tree/some/branch
  • relative and absolute paths

Some of these are maybe controversial, feel free to discuss.

If you want to play around you can use the following code.

require "./src/dependency_definition"

def t(s)
  d = Shards::DependencyDefinition.from_cli(s)
  pp! d

  puts "\nshard.yaml"
  puts(YAML.build do |yaml|
    yaml.mapping do
      d.to_yaml(yaml)
    end
  end)

  puts "\nshard.lock"
  puts(YAML.build do |yaml|
    yaml.mapping do
      d.dependency.to_yaml(yaml)
    end
  end)
end

t("github:crystal-lang/crystal-db")

t("github:crystal-lang/crystal-db#0.13.0")

which will output

d # => #<Shards::DependencyDefinition:0x1011fbbd0
 @dependency=
  #<Shards::Dependency:0x101217e60
   @name="db",
   @requirement=*,
   @resolver=
    #<Shards::GitResolver:0x10120f200
     @local_path=
      "/Users/bcardiff/.cache/shards/github.com/crystal-lang/crystal-db.git",
     @name="unknown",
     @origin_url="https://github.com/crystal-lang/crystal-db.git",
     @source="https://github.com/crystal-lang/crystal-db.git",
     @updated_cache=true>,
   @resolver_key=nil,
   @source=nil>,
 @resolver_key="github",
 @source="crystal-lang/crystal-db">

shard.yaml
---
db:
  github: crystal-lang/crystal-db

shard.lock
---
db:
  git: https://github.com/crystal-lang/crystal-db.git
d # => #<Shards::DependencyDefinition:0x1011fb240
 @dependency=
  #<Shards::Dependency:0x1012179b0
   @name="db",
   @requirement=Shards::VersionReq(@patterns=["~> 0.13.0"]),
   @resolver=
    #<Shards::GitResolver:0x10120f200
     @local_path=
      "/Users/bcardiff/.cache/shards/github.com/crystal-lang/crystal-db.git",
     @name="unknown",
     @origin_url="https://github.com/crystal-lang/crystal-db.git",
     @source="https://github.com/crystal-lang/crystal-db.git",
     @updated_cache=true>,
   @resolver_key=nil,
   @source=nil>,
 @resolver_key="github",
 @source="crystal-lang/crystal-db">

shard.yaml
---
db:
  github: crystal-lang/crystal-db
  version: 0.13.0

shard.lock
---
db:
  git: https://github.com/crystal-lang/crystal-db.git
  version: 0.13.0

@ysbaddaden
Copy link
Contributor

I think I'd still go the other way around: from a CLI arg that directly targets a resolver and have it parse the URL into a dependency, and fallback to iterate each resolver to parse the URL for known hosts when no CLI arg is specified, and fail asking for an explicit resolver when we can't know what the remote is.

@straight-shoota
Copy link
Member

straight-shoota commented May 16, 2025

I would go the other way: encode the full information in a single URI.
This should work well. Most of the time, the natural formats are already URI-compatible. That's a fully descriptive, self-contained format that's easy to reuse in other contexts.

The implementation of different strategies is quite simple: We parse the value as a URI and route it to a resolver-specific parser based on the scheme (e.g. git, github, git+https, etc. go to GitResolver).

If the URI scheme does not indicate a specific resolver (e.g. https), we fall back to heuristics based on the full URI (e.g. based on known hostname, or file extension) or potentially even trial-and-error through possible resolvers (very optional).

@bcardiff
Copy link
Member Author

For GitHub, from the clone menu we should support

I want to support the most common format the user can copy paste directly.

And I want to preserve as much as possible user intent. That's why I don't want the Resolver to offer a canonical representation of the dependency for the shard.yaml. Eg: urls are downcased in the normalization and that will look odd.

I saw mentions of git+ssh somewhere. IIUC those will be git@github.com:owner/repo.git, am I missing another pattern?

Thanks for the feedback.

@bcardiff
Copy link
Member Author

Oh and should I move parsing and rendering to a DependencyDefinition type to avoid having these changes in dependency but not used in all places?

@straight-shoota
Copy link
Member

We could probably turn the named tuple return type from .parts_from_cli into a dedicated type which holds this information.

@bcardiff bcardiff force-pushed the dependency-cli-parsing branch from 20d0567 to eafca30 Compare May 16, 2025 13:56
@bcardiff
Copy link
Member Author

I think I incorporated all of the feedback.

expect_parses("github:Foo/Bar@1.2.3", "github", "Foo/Bar", VersionReq.new("~> 1.2.3"))

# GitHub urls
expect_parses("https://github.com/foo/bar", "github", "foo/bar", Any)
Copy link
Member

@straight-shoota straight-shoota May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I'm wondering about why this gets normalized to github resolver instead of "git", "https://github.com/foo/bar".
Both options are valid. So we only need to decide which one we want to pick.
I suppose the reason for github is that it's more concise? That's nice but not strictly necessary.
git would be closer to the original input.

Copy link
Member Author

@bcardiff bcardiff May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me closer to the intent is to have github: foo/bar, because I'm assuming the user is copy-pasting the url in the browser. We do preserve the format with trailing .git as those are copied from the clone popup.

At some point maybe it's worth configuring shards to map all github source to be resolved in some specific way, but is a separate story.

expect_parses("..\\relative\\windows", "path", "../relative/windows", Any)
{% end %}
# Path file schema
expect_parses("file://#{local_relative}", "path", local_relative, Any)
Copy link
Member

@straight-shoota straight-shoota May 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: This is technically incorrect. The URL file://a/releative/path means path /relative/path on host a. The // indicates an authority.

The correct URL for a local path would be file:a/relative/path.

I believe some applications implicitly interpret file://a/b/c as file:a/b/c because this is a common mistake.
The file scheme is not intended for remote access. We can consider doing that, but I don't think it's a good idea as it would solidify invalid behaviour.
Instead, the best way to handle this is probably to error if the host of a file URI is anything else but empty or localhost.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, we can send a warning about the preferred form, but still accept it.

Let's be nice with our users. I've used that format before myself.

@straight-shoota
Copy link
Member

straight-shoota commented May 20, 2025

Following up on #673 (comment), I propose an alternative implementation, fully based on URI parsing in #678

IMO this makes it much easier to handle things. The different routes are clearly decided by the URI scheme. All parsing is safe and sound using stdlib's URI class.

@straight-shoota
Copy link
Member

For reference, VCS dependencies in PIP: https://pip.pypa.io/en/stable/topics/vcs-support/

@bcardiff
Copy link
Member Author

The alternative implementation seems good. We can close this PR then in favor of the other. Anything else that is missing?, because it's marked as draft.

@straight-shoota
Copy link
Member

#678 is a patch against the branch of this PR. We should merge that in and then continue here.

@bcardiff
Copy link
Member Author

Commit pushed into this branch

@straight-shoota
Copy link
Member

The windows test is failing because for relative URIs we expect .starts_with?("./"), .starts_with?("../") to recognize a relative path by prefix.
If we want to support path's with backslash separators as well, we need to add .starts_with?(".\\"), .starts_with?("..\\")

Co-authored-by: Johannes Müller <straightshoota@gmail.com>
Comment on lines 82 to 88
source = uri.to_s
# narrow down requirement
requirement = Any
if source.includes?("@")
source, version = source.split("@")
requirement = VersionReq.new("~> #{version}")
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Splitting on "@" over the entire URL is error prone. It would also match the user info portion of a URI.
git:user@host:repo parses into source user with version requirement ~> host:repo.
This mechanism must be restricted to the path of the URI.

Even then, there's a chance for misinterpretation because @ is a completely valid character in a URI path. Might not be likely to encounter shards at such a path, but I'm wondering if it's worth it. IMO the @ syntax is not essential.
We could consider a different format for expressing version constraints concisely in the URL.

Nix flake uses query parameters to encode version information (e.g. git+https://example.com/foo/bar?branch=master)
https://nix.dev/manual/nix/2.28/command-ref/new-cli/nix3-flake#examples

I think this is not entirely elegant because query parameters have a different purpose. They're supposed to be resolved by the remote party.

The URI fragment is more elegant for this. It represents resource-internal information resolved by the local party.
Our URIs point to shard repositories, that's the resource. And navigating to a specific portion of a resource is the purpose of a fragment.

So URLs would look like this: github:Foo/Bar#1.2.3, git+https://example.com/foo/bar#1.2.3.

We can further enhance this by adding support for parameters to the fragment, thus being able to resolve fragments like #branch=master, or #tag=1.2.3-rc1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that # force the need of quoting in the shell?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. You can use URLs with fragments perfectly fine in shell commands.

The shell considers # as a comment only if it's preceded by whitespace.

$ echo foo#bar
foo#bar
$ echo foo #bar
foo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using fragments for version numbers with the same semantics currently have @ seems good to me.

Tag and branch (and revision/commit) can come later imo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may work, but other package managers (cargo, bundler, ...) use @version, so using something else like #version is confusing, and # has a special meaning in shells (comment), which only increases the confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact shards encourage semver, calver seems good enough for me to make @ or whatever syntax that mentions just a version to mean ~>.

I would leave out for now any other version operator.

I don't mean to jump. I mean to discuss and decide. I don't want to not support ~> version restriction out in this first cut.

Copy link
Member

@straight-shoota straight-shoota Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is not about whether it's generally reasonable or practical to interpret a plain version as ~>.
My concern is that in all other applications that use a similar reference scheme, a plain version means exactly that version. Not a lower bound / pessimistic restriction for the actual version.
Introducing different semantics than what people are used from other tools causes confusion. I think we should try to avoid that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitively @ is = in my mind. # is ok if we don't find anything better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. So for ~> it should be github:foo/bar#~>1.2.3 ?

I think is important to support ~> so we can have a good copy paste instruction with version restriction and we should encourage ~> as much as possible.

Copy link
Member

@straight-shoota straight-shoota Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should encourage ~> as much as possible.

This might be diverting this discussion even more than it already is. But I'm not sure I can agree on that.
What's the reason?
I don't think we have ever recommended version restrictions anywhere in installation instructions for any shard before.

@bcardiff
Copy link
Member Author

bcardiff commented Jun 8, 2025

Made some changes to use # instead of @ while supporting it also in other syntax like git@ https:// etc and not just github:

{% end %}
# Path file schema
expect_raises Shards::Error, "Invalid file URI" do
Shards::DependencyDefinition.parts_from_cli("file://#{local_relative}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer a warning or a message explicitly explaining why this is not accepted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can iterate on the warnings and messages once this is integrated in the CLI, but yes 👍

@beta-ziliani
Copy link
Member

beta-ziliani commented Jun 10, 2025

I gave my opinion on the high level of it. I think it's loable to center on the user, in aspects like being able to copy-paste from the browser. Thanks Branch!

@luislavena
Copy link
Contributor

Folks, I think we are too much into the weeds about the pessimistic comparator versus an specific version that we are forgetting that not all shells are going to parse ~> as you expect.

Point of example, zsh:

$ cat shards
#!/usr/bin/env sh

echo "called with $@"
$ ./shards add github:foo/bar
called with add github:foo/bar

But fails with pessimistic one:

$ ./shards add github:foo/bar#~>1.2.3

$
$ cat 1.2.3
called with add github:foo/bar#~

This will force the user to quote the parameter:

$ ./shards add "github:foo/bar#~3.2.1"
called with add github:foo/bar#~3.2.1

@bcardiff
Copy link
Member Author

Ok, leaving # as exact version. (Or I can drop that too).

Adding a version: manually is easy, definitely, but this whole thing is to avoid the user editing the shard.yml manually if possible. Leaving version information out seems a miss to me.

Deferring the discussion on how to support ~> 1.2.3 as a uri-sh parameter for later (maybe something along the way of #compat=1.2.3 to avoid the >.

I do want to move in to the actual code of changing the yaml based on this 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants