-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Enum Variants as Arg|ArgGroup|SubCommand keys #1104
Comments
Even without a derivable component we'll need to iron out the constraints that are part of clap which |
Is this still being considered? I really like this approach for a number of reasons. I saw the recent blog post didn't mention it which is why I am asking. I'm not sure if this is possible, and I think it this would rely on rust-lang/rfcs#2593. Anyway I imagine the following scenario. The user defines an enum. Each variant represents an argument. You call For example (pseudocode): // in Clap:
struct Arg<Variant> {
// everything here remains the same
}
// here `Args` is the enum itself
struct App<Args> {
...
fn arg<T: Into<Arg<Self::Args>>>(self, a: T) -> Self
fn get_matches(self) -> ArgMatches<Self::Args>
}
// here `Args` is the enum itself
impl ArgMatches<Args> {
...
fn get_value<Variant: Self::Args> -> &str {
...
}
fn get_values ...
}
// user code (no argument name strings!):
enum MyArgs {
Username,
}
let matches = App<MyArgs>::new("foo")
.arg(
Arg<MyArgs::Username>::new().short("u")
)
.get_matches();
let username = matches.get_value::<MyArgs::Username>(); Alternatively, if the Rust team ends up deciding to allow you to restrict enum variants more (not currently planned AFAIK): // in Clap:
// here `Args` refers to a variant of an enum
struct Arg<Variant: Args> {
fn new_single() { ... } // Only allowed if the variant has an associated string
fn new_multiple() { ... } // Only allowed if the variant has an associated iterator
fn new_boolean() { ... } // Only allowed if the variant has an associated boolean
}
// here `Args` is the enum itself
impl ArgMatches<Args> {
...
fn get_value_automatic<Variant: Self::Args> -> Variant {
...
}
}
// user code (no argument name strings!):
enum MyArgs {
Usernames(IntoIterator<String>),
}
let matches = App<MyArgs>::new("foo")
.arg(
Arg<MyArgs::Usernames>::new_multiple().short("u")
)
.get_matches();
let MyArgs::Usernames(usernames) = matches.get_value(); This way, there is a single function Anyway this is all still up in the air anyway since it seems like the Rust team hasn't settled on exactly what features they plan to add to enums. What do you think? |
Is there a workaround to this? |
I tried to read up on past effort for enums but the blog post was removed as part of the new site and for some reason the wayback machine doesn't have that one article @kbknapp happen to have this hanging around somewhere? |
I believe I do. I'm on mobile at the moment and will look it up shortly once I get to a computer. |
I found it. Here is the post content# Issue: Removing StringsOne of the biggest complaints about clap is the “stringly typed” nature of arguments. You give them a string name, and retrieve their value by string. Where this goes wrong is if you typo a string name, you won’t know until you hit that code path. Combined with clap goes to decent lengths to ensure the clap::ArgMatches struct keeps memory usage low which means stripping out any arguments which weren’t used at runtime. This means the clap::ArgMatches struct doesn’t know if you typo’ed an argument value, or it simply wasn’t used at runtime. The solution to above problem would be keeping a list of all arguments in the clap::ArgMatches struct. However, for CLIs that have hundreds of arguments, the memory usage adds up quickly. So there are two (actually three) problems being conflated here:
Challenge: Typo’ing Argument NamesThe solution to this would be to use enum variants. This way the compiler can tell you, no such enum variant exists if you typo it. The question then is how to store such a value in each Argument (and clap) for parsing to be able to compare. Another constraint placed on myself, is I don’t want existing clap users to have to upgrade en masse. If you’re happy using Strings, or you have a CLI that works, why spend the resources to change it. That means the requirements are:
Before I talk about the solutions I tried, and ultimately chose let me speak about the other challenges. Challenge: clap::ArgMatches has no knowledge of defined argumentsOk, so now we have to come up with a solution which also:
If we quickly look at the current naive solution of storing Strings, so we know our upper bound, it would look something like (note: the actual types are nested and shown here as essentially primitives for brevity): struct ArgMatchesV2 {
args: Vec<(String, Vec<OsString>)>,
subcommand: Box<ArgMatches>
...
}
struct ArgMatchesV3 {
args: Vec<(String, Option<Vec<OsString>>)>,
subcommand: Vec<(String, Box<ArgMatches>)>
...
} I put the That doesn’t look different! You’re right. However, if one used two arguments at runtime (out of a possible valid 100 for instance) Since a String is essentially a Vec, and a Vec is a pointer plus length (usize) plus capacity (usize). That means it’s an additional 98 * (usize*3) plus whatever the actual String bytes are that are allocated, which if we just say an average of five bytes per String for simple math, and on my system a pointer/usize is 8 bytes, the difference between the above ArgMatchesV2 and ArgMatchesV3 is:
That’s just args. Look at subcommands: Same math as before, but with the way subcommand work you can only ever have a single subcommand per “level”. However, now we’d need to store all possible subcommands, which even if Box::new doesn’t allocate for all the subcommands we didn’t use it’s still an additional String (49 bytes on average) plus a pointer (8 bytes). Now subcommand numbers usually don’t go as high as args, possibly 30-50 at most. Let’s just say 25 for easy math:
Giving us a grand total of:
Ok we have our baseline. Challenge: Iterating and comparing strings is slowOk, whatever solution we go with, iterating Arg struct (and thus whatever this new field is) should be faster than a bunch of Strings. Since we’ve already gone over the math and breakdown of a String, we know it involves a heap lookup (Since Strings are a essentially Vec), so if we can avoid touching the heap that’d be gravy. Solution 1: GenericsMy first attempt at solving this was to use a generic type bound on PartialEq. While implementing this, I had an icky feeling though. All my type signatures were changing to clap::Arg<T: PartialEq> which expands out to clap::ArgMatches<T: PartialEq>. This has a few unfortunate side affects:
So this doesn’t seem like a good route. Next I tried Trait Objects Solution 2: Trait ObjectsOk, so what if instead of a type T: ParitalEq I just stored &’a PartialEq (or Box)? This fixes the problem of T being huge, now T is just a pointer (8 bytes). It also fixes the type signature as people still would just use clap::Arg. However, I’m still touching the heap each time I want to compare or find an argument. I put this on the back burner as a maybe. Solution 3: HashingOk, so what if I simply hash T and get a consistent output (I picked u64, or 8 bytes)? Sure, I’ll have to hash all 100 T’s at definition time, but clap iterates and compares each argument far more than once per argument. So it seems a small price to pay, especially if I only need a fast hashing algorithm instead of a super secure one. So I picked a FNV hash implementation which is insanely fast. Hashing a T using FNV is only a few ns for all the types of T I tested. Using this, clap::Arg can now store an Without going through all the math again, here are grand totals using u64 instead of String (not to mention u64 doesn’t allocate any additional bytes for it’s content):
Much improved. The jury is still out on whether I’ll provide the ability to know if an argument was used or not. 1,000 bytes (nearly 1KB) is still a good bit, compared to the 24 it could be. Granted, this is for a large CLI (100+ args and 25+ subcommands at a single level), so actual uses should be lower. We’ll see, although 1,000 bytes is much easier to swallow than over six times that. The biggest downside is each access now must be hashed. The reason I’m OK with this, is access is much less frequent, and typically only happens once. So assuming 100 arguments, we’re comparing 200 hashes with that of storing, iterating, and comparing a larger heap based type for more than 200 times. I haven’t counted lately, but due to how parsing and validation occurs the number of times an argument gets compared is in the order of (NUM_ARGS * NUM_USED_ARGS)*~3ish. Concretely that means, if you use an average of 5 arguments per run, out of a possible valid 100, there are roughly 1,500 comparisons of equality. For CLIs like ripgrep which can have hundreds or thousands of arguments per run, this number can dwarf 200 hashes easily. StatusI’m in the middle of implementing this change. It’s somewhat conflated with re-working how the parsing happens in order to be more efficient. This is the last major change prior to releasing v3-beta1 which I’m hoping will happen in the coming weeks/month. |
@pksunkara I propose this issue and #1663 be deferred to the clap4 milestone.
|
With that out of the way, some quick thoughts: Clap has several nearly distinct APIs
In considering our care abouts, we need to consider each API and what the expected role of it is. For example, if the expectation is someone will probably use the derive, unless they want to get rid of the overhead of proc macros, then derives for custom keys (#1663) seems contradictory. Similarly, the derive API prevents the problems with stringly typed APIs. On a related note, I'm unsure if we can generate the needed enums for the derive API to use anything but strings. We can't see through to other enums and structs to generate a global list. We'd need to support a composable enum and then somehow allow all of the typing for As for performance, while keys use less memory, We might be able to resolve the generics problem via type aliases (e.g. One problem with enums is that we default Regarding the hash work in the above blog post, one concern brought up at the time was collisions. I can't find the discussions to see how that was resolved. I propose we rename See playground for example code
Benefits
Downsides
|
Just wanted to call out #604 is the issue for "error on typo" |
In thinking on this more, I propose we reject this
@pksunkara what are your thoughts? |
Those are good reasons for me to agree with. But I think we should let @kbknapp weigh in too. |
Hi So far the work around I have been using is using strum to convert enums to str slices. #[derive(EnumString, Display, AsRefStr)]
pub enum Args {
User
}
fn main() {
let app = clap::App::new("app");
let matches = app
.version("0.1.0")
.usage("app -user")
.arg(
clap::Arg::with_name(Args::User.as_ref())
.short("u")
.takes_value(true)
)
.get_matches_safe();
match matches {
Ok(matches) => {
let user = matches.value_of(Args::User.as_ref()).unwrap();
app(user );
}
Err(e) => {
println!("{}", e);
}
}
} https://github.com/Peternator7/strum Maybe something like this could be made more ergonomic, with additional attributes. |
@ta32 Yes, strum or something like Let's start with the remaining API changes. We'd either need to
Each of the options using a generic type would only work with clap code that is also generic on that type, limiting code sharing. We then have to deal with the recursive nature of this across subcommands. Most likely, we will end up with users creating a single Then from a users perspective, they would be most likely needing to use a proc macro to help them manage this enum, like strum. The question for me to understand is the overlap between users who want to use the builder API but are still ok with proc macros (one reason to avoid the derive API is to save on compile times). On the other hand, today without this feature, users can do the workaround you provided which is similar to my Option 3. If we decide against this feature, we'd probably want to make an example out of your code snippet. So in balancing out the different aspects, it makes me wonder if the derive API and workarounds like yours are sufficient. |
With the above reasoning, I'm going to go ahead and close this. This was an ambitious idea but unfortunately, it doesn't seem like its going to pay off. I did note the workarounds discussed in this thread in #1041 so we can make sure we continue to support them. |
I.e. remove the "stringly typed" nature of clap
How this is done internally is up for debate and the reason for this issue. See the related topic #1041
Basically we want to Hash whatever key is given and store that instead of a
&str
/String
/Cow<_>
.The text was updated successfully, but these errors were encountered: