Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(parser): handle negative numeric tokens #62

Closed
wants to merge 1 commit into from

Conversation

bcoe
Copy link
Collaborator

@bcoe bcoe commented Feb 6, 2022

Handle a bunch of finicky edge-cases around negative positionals, and negative numbers as arguments.

From yargs' test suite:

  it.only('should handle parsing negative #s', function () {
    const argv = parser([
      '-33', '-177', '33',
      '--n1', '-33',
      '-n', '-44',
      '--n2=-55',
      '--foo.bar', '-33',
      '-o=-55',
      // Note: one major change in this library from yargs-parser
      // is the dropping of greedy arrays and nargs.*/
      '--bounds', '-180', '--bounds=99', '--bounds', '-180', '--bounds', '90',
      '--other', '-99', '--other', '-220'
    ], {
      multiples: ['bounds', 'other'],
      withValue: ['n', 'n1', 'n2', 'foo.bar']
    })
    console.info(argv);
    argv._.should.deep.equal([-33, -177, 33])
    argv.n1.should.equal(-33)
    argv.n.should.equal(-44)
    argv.n2.should.equal(-55)
    // TODO implement dot properties:
    // argv.foo.bar.should.equal(-33)
    argv.o.should.equal(-55)
    argv.bounds.should.deep.equal([-180, 99, -180, 90])
    argv.other.should.deep.equal([-99, -220])
  })

Also address issue with parsing of -o=foo, for short opts.

TODO: add tests.

@bcoe bcoe requested a review from shadowspawn February 6, 2022 16:31
@@ -175,6 +187,28 @@ const parseArgs = (
return result;
};

// Look ahead to next token in argv array:
function peek(argv, pos) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC: bakkot, @shadowspawn, @ljharb, a step towards a token based parser, as discussed in #52.

Benefit is it encapsulates the logic, and simplifies reading the codebase, e.g.,:

  } else if (peek(argv, pos) !== TOKENS.FLAG) {

// Add 'i' to 'pos' such that short options are parsed in order
// of definition:
ArrayPrototypeSplice(argv, pos + (i - 1), 0, `-${short}`);
}
}

const suffix = StringPrototypeSlice(arg, 2); // maybe -f=bar.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a test for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suffix code is only for what I think is mistaken processing of =, per previous comments.

(On a separate note, we use arg fairly loosely though this code. We could try for some more meaningful names, but I suggest try that is a pure refactor and not when we are changing code.)

@bcoe
Copy link
Collaborator Author

bcoe commented Feb 6, 2022

Related: #60

function token(arg) {
const chr = StringPrototypeCharAt(arg, 0);
const nextChr = StringPrototypeCharAt(arg, 1);
if (StringPrototypeMatch(chr, /^[0-9]$/)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (StringPrototypeMatch(chr, /^[0-9]$/)) {
if (RegExpPrototypeSymbolMatch(/^[0-9]$/, chr)) {

const nextChr = StringPrototypeCharAt(arg, 1);
if (StringPrototypeMatch(chr, /^[0-9]$/)) {
return TOKENS.NUMERIC;
} else if (chr === '-' && StringPrototypeMatch(nextChr, /^[0-9]$/)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if (chr === '-' && StringPrototypeMatch(nextChr, /^[0-9]$/)) {
} else if (chr === '-' && RegExpPrototypeSymbolMatch(/^[0-9]$/, nextChr)) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also use test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only testing for one digit? So -1foo is returning numeric? It does not actually matter too much since to support negative numbers we are dropping support for digits, but I feel misleading to call it NUMERIC based on that test.

if (options.short && options.short[arg])
arg = options.short[arg]; // now long!
// ToDo: later code tests for `=` in arg and wrong for shorts
if (suffix.startsWith('='))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (suffix.startsWith('='))
if (StringPrototypeStartsWith(suffix, '='))

Comment on lines +202 to +204
if (StringPrototypeMatch(chr, /^[0-9]$/)) {
return TOKENS.NUMERIC;
} else if (chr === '-' && StringPrototypeMatch(nextChr, /^[0-9]$/)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can drop ^ and $ since were only matching one character (Unless I'm overlooking something).
  • Should we use \d for increased unicode support, or does that add more complexity/edge cases?
  • Should we be using .test() since were not using the result of .match()
    e.g. /[0-9]/.test(chr)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good call, this should use RegExpPrototypeTest

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to double check, but I believe \d is the same as [0-9] so a matter of taste.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to double check, but I believe \d is the same as [0-9] so a matter of taste.

Needed to double checked myself! Got my languages mixed up (: https://regex101.com/r/PhKpmF/1

@shadowspawn
Copy link
Collaborator

shadowspawn commented Feb 6, 2022

I don't agree in principle with treating negative numbers as special tokens. In brief it clashes with using digits as short options, and adds an opinionated behaviour. Slightly longer version: #60 (comment)

I'm personally 👎

But I expect casual users are more likely to encounter negative numbers than single digit options or other situations where it misfires. So we might decide to implement it anyway...

@bcoe
Copy link
Collaborator Author

bcoe commented Feb 6, 2022

I don't agree in principle with treating negative numbers as special tokens. In brief it clashes with using digits as short options, and adds an opinionated behaviour. Slightly longer version: #60 (comment)

@shadowspawn without special casing negative numbers, it's hard to implement an example like a basic calculator application:

add -300 200

Becomes:

short options: -3 -0 -0, positional 200, which is pretty gross.

If you do want to implement the behavior -3, -0, -0, it seems easier to have it stored as -300` in positionals, and for the implementer to clean it up in post-processing (expand it as if it was three short options).


With regards to -p80, or -p1, I agree with @ljharb, that the 80 only be consumed if p is configured as accepting a value, that seems reasonable -- otherwise again you end up with the parse -p, -8, -0, and it's hard to know that the -8 and -0 bind to the p when working backwards.

@shadowspawn
Copy link
Collaborator

Short version: Yes, there is a trade-off.

Long version

without special casing negative numbers, it's hard to implement an example like a basic calculator application

Yes, that would be a use case that would not be supported out of the box.

Four approaches from a quick think:

  • find the first number in the args and only pass parseArgs the args before that
  • substitute the leading dash in numbers with another character before parsing, and restore afterwards
  • make -- part of the user syntax, so exotic usage might be add --reverse-polish --output=json -- -300 200 (I know this is heavy-handed, but it is built-in and I have seen it used in the wild! Albeit, for reasons other than numbers.)
  • use another library 😄

If you do want to implement the behavior -3, -0, -0, it seems easier to have it stored as -300` in positionals, and for the implementer to clean it up in post-processing (expand it as if it was three short options).

(Tongue in cheek.) Or to rephrase, if you want to use digits as short options, do it yourself. Wait, why are we offering a parser if this is the easy approach?


As an aside. I think this is a subtle example of not always being feasible to build on top of a parser which has applied a number of decisions. Imagine processing this with negative numbers treated as positionals, and then try and post-process digits as options:

parseArgs(['-a -1 -b foo`], { withValues: '1' });

It isn't as simple as looking for the digits in the positionals and taking the next positional, the parser ate some options out of the middle.

@shadowspawn
Copy link
Collaborator

shadowspawn commented Feb 7, 2022

I'll add that there is somewhat less namespace pressure on short options these days, because we can use long-only options for the rare cases. For a user of our library, would still be a loss if a digit was the nicest semantic choice, but there would at least be alternatives available.

Note ls uses all lowercase letters and some uppercase letters and symbols and digits.

SYNOPSIS
     ls [-@ABCFGHILOPRSTUWabcdefghiklmnopqrstuvwxy1%,] [--color=when] [-D format] [file ...]

@bcoe
Copy link
Collaborator Author

bcoe commented Feb 7, 2022

what if we treat arguments wrapped in quotes as positionals, or values, and drop the quotes? Then an implementer could pre-process negative numbers by wrapping them in quotes.

@shadowspawn
Copy link
Collaborator

shadowspawn commented Feb 7, 2022

what if we treat arguments wrapped in quotes as positionals, or values, and drop the quotes? Then an implementer could pre-process negative numbers by wrapping them in quotes.

Hmmm, I don't like that as a work-around. Then quotes are special and get eaten by the parser. Note though, if the implementer wrapped the negative numbers in quotes the current parser would not treat them as options. It is "just" the implementer has to remove the quotes themselves after parsing.

@@ -118,20 +128,23 @@ const parseArgs = (
if (arg.length > 2) {
for (let i = 2; i < arg.length; i++) {
const short = StringPrototypeCharAt(arg, i);
// Case of `-o=foo`:
Copy link
Collaborator

@shadowspawn shadowspawn Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not consider this a short option and its value in the way you are assuming. The = is for long options not for short options. The four possible forms for specifying the expected value are:

-v VVV
-vVVV // if we support it
--value VVV
--value=VVV

(Or -v on the end of a combined short group with following argument.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanding the = to an argument seems off to me, so I think think that two options are either throw a parsing error, or treat it the same as:

-v foo

What does POSIX do?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learnt something rereading the Utility Conventions. = is not a valid short option name, so we are into undefined behaviour. (I thought I saw something about "printable" somewhere, didn't find that just now.)

I have not checked getopt to see how it deals with this.

Guideline 3:
Each option name should be a single alphanumeric character (the alnum character classification) from the portable character set.

https://pubs.opengroup.org/onlinepubs/9699919799.2016edition/basedefs/V1_chap12.html#tag_12_01

} else if (pos + 1 < argv.length &&
!StringPrototypeStartsWith(argv[pos + 1], '-')
) {
} else if (peek(argv, pos) !== TOKENS.FLAG) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Processing the last arg will now drop into this conditional, whereas previously it would drop through to the next test.

i.e. removed pos + 1 < argv.length test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is processing an option with a following argument available as a value, if needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bounds check was moved into peek, if we decide not to handle negative numbers I'll abandon this refactor any ways.

// value and then increment pos so that we don't re-evaluate that
// arg, else set value as undefined ie. --foo b --bar c, after setting
// b as the value for foo, evaluate --bar next and skip 'b'
const val = options.withValue &&
ArrayPrototypeIncludes(options.withValue, arg) ? argv[++pos] :
(ArrayPrototypeIncludes(options.withValue, arg) ||
ArrayPrototypeIncludes(options.multiples, arg)) ? argv[++pos] :
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have you added a test for multiples here? That does not look right for a multiple flag. The logic for multiples is in storeOptionValue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior of multiples mixing positional and '='s appeared to be buggy, I can add a test for this separately.

} else if (chr === '-' && StringPrototypeMatch(nextChr, /^[0-9]$/)) {
return TOKENS.NUMERIC;
} else if (chr === '-') {
return TOKENS.FLAG;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two other things we could identify using token() are - and --. But the focus of this PR is negative numbers, so leave that for future refactor.

@shadowspawn
Copy link
Collaborator

Treating negative numbers as positionals complicates the short option group processing, which could previously assume a blind expansion. Behaviour now broken:

% node zero.js -a1
{ flags: { a: true }, values: { a: undefined }, positionals: [ '-1' ] }

(For interest, Commander suffered from similar weird outcomes from blind expansion. To get it right I switched to processing just one short option at a time and no fall-through, go around big loop again to process anything left. Commander has flags and required values and optional values, but not negative numbers. https://github.com/tj/commander.js/blob/02a124c7d58dbae2ef11f9284b2c68ad94f6dc8b/lib/command.js#L1378)

@shadowspawn
Copy link
Collaborator

(Sorry @bcoe , lots of comments after you were running hot trying out the yarns-parser tests!)

@bcoe
Copy link
Collaborator Author

bcoe commented Feb 7, 2022

I would be curious to hear other people's thoughts, re: negative numbers, I can attempt the pre-processing approach if we would prefer not to support them.

Numerics in most cli apps I think are positive integers, delays, ports, etc. Are the common use cases for negative numerics I'm missing?

@ljharb
Copy link
Member

ljharb commented Feb 7, 2022

Quoting props so the shell doesn’t interpret them - like globs - is standard. It seems perfectly fine to me that if i want to have a negative number as a value, my choices are to either use a long form with an = (--a=-1) or to quote the value (-a '-1').

@shadowspawn
Copy link
Collaborator

Quoting works for passing values that would otherwise be interpreted by shell, but does not work directly for protecting things that look like options. The shell removes the quotes before they are passed through.

% node -e 'console.log(process.argv)' hello '-a' '*'
[ '/usr/local/bin/node', 'hello', '-a', '*' ]

@shadowspawn
Copy link
Collaborator

For comparison, in Commander options that expect an argument take the next argument whether or not it starts with a dash. So --foo -123 works fine if option foo is expecting a value. i.e. get { foo: '-123' } in parsed options

There a trade-off, Commander gets an occasional question about why --foo --bar is not an error, when foo expects a value! i.e. get { foo: '--bar' } in parsed options

@shadowspawn
Copy link
Collaborator

Another implementation for interest. Python's arparse goes the extra mile, and allows negative numbers as positional and option-arguments iff there are no digit options defined. This is quite sophisticated. (argparse does not have greedy options, arguments with a starting dash are otherwise options and not option-arguments.)

https://docs.python.org/3/library/argparse.html#arguments-containing

If we went this approach we would have to decide which mode zero-config uses.

@aaronccasanova
Copy link
Collaborator

The more I look at the above examples and the referenced argparse Python module the more I lean towards an explicit configuration based approach for this behavior. The proposed options API in PR #63 (single configuration object) may also be a helpful design pattern for capturing negative numbers.
e.g.

// node calc.js --add -300 200
const { values } = parseArgs({
  options: {
    add: {
      type: 'number',
      nargs: 2, // defaults to 1
    }
  }
})
// values: { add: [-300, 200] }

@bcoe
Copy link
Collaborator Author

bcoe commented Feb 27, 2022

I added nargs and greedy array support to yargs-parser, and regret it. I feel it creates too many opportunities for alternate interpretations of what a parse should like like when you provide additional options after the array argument.

I far prefer the explicit --arr 1 --arr 2 --arr 3.


I'd rather hold off on adding a number type for MVP, and try some sort of pre-processing in the yargs shim I've been working on.

@bcoe bcoe closed this Feb 27, 2022
@ljharb ljharb deleted the fix-negative-numbers branch February 27, 2022 14:56
@shadowspawn
Copy link
Collaborator

I said:

I don't agree in principle with treating negative numbers as special tokens. In brief it clashes with using digits as short options, and adds an opinionated behaviour.

I want to expand on that after thinking about it again in greedy/choosy context. I still don't like the idea of detecting negative numbers as such...

However, a different way of reasoning about this is that digits as options are rare, so we might choose not to recognise them. Then --foo -2 can be an option and its value because -2 is not an option (rather than because -2 is a number). And -789 can be a positional because -7 is not an option (rather than because -789 is a number).

This allows negative numbers to appear as option values or as positionals, which is what Python's argparse does when there are no digit options declared, and would allow a calculator per #62 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants