Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] CLI and init improvements #509

Closed
wants to merge 14 commits into from
Closed

Conversation

groutr
Copy link
Contributor

@groutr groutr commented Mar 31, 2020

Description of changes

  1. Removed expensive logic from __init__. CLI logic should not reside in this file. This file is executed on every import. All the logic that was here should be __main__.py. We also don't need to set a recursion limit anymore.
  2. Moved all the register_arguments functions to __main__. This speeds up the help pages by at least an order of magnitude (Improve startup time #472 (comment)). It has the effect of making augur feel a lot faster/responsive.

I performed some analysis with a few of the augur commands. I observed the following

  1. Commands like augur version and the help pages reduced the number of function calls by 95-98% (from 930k to under 30k). These commands are fast now.
  2. Most other commands that I tried had a very minor performance improvement due to making around 5000 fewer function calls.

Related issue(s)

Fixes #472

Testing

This still needs to be done. The CLI interface should be the same after these changes. Tests can be written here. I'm welcoming feedback during this testing phase.

Thank you for contributing to Nextstrain!

@groutr
Copy link
Contributor Author

groutr commented Mar 31, 2020

One of the things that I noticed is that there are many required options? Wouldn't it make more sense to have those as positional arguments? It seems strange to have these listed in the "optional arguments" in the help documentation. An example is --sequences in augur align.

@groutr groutr changed the title [WIP] CLI and init fixes [WIP] CLI and init improvements Mar 31, 2020
@elebow
Copy link
Contributor

elebow commented Mar 31, 2020

On required options instead of positional arguments:

Required options often provide a usability benefit in that they convey the intent of the command clearly. When reading a Snakefile, for example, a user does not need to recall whether a sequences file comes before or after a metadata file. Yes, the user could easily refer to the documentation, but using required options makes it unmistakable right there in the command. It's usually worth the extra characters.

I agree that it would be better for required options to not be described as "optional". Argparse offers this through "argument groups" (https://docs.python.org/3/library/argparse.html#argument-groups). Though the usage example at the top of the help text does omit the brackets for required args.

The argparse documentation says elsewhere:

Note: Required options are generally considered bad form because users expect options to be optional, and thus they should be avoided when possible.

I don't necessarily agree with that, especially for a tool with as complex a UI as augur—it's not reasonable to expect the user to remember the order of positional arguments for each of augur's modes.

@groutr
Copy link
Contributor Author

groutr commented Mar 31, 2020

@elebow, I'll explore the argument groups idea. Hopefully that will make things a little less confusing without breaking any kind of compatibility.

Another idea I've seen somewhere is the concept of "named" arguments where the name of the argument is prefixed to the positional argument. When the arguments are parsed, the prefixed positional arguments get matched up to the correct positional arguments based on the prefix.

@danielsoneg
Copy link

As always, not my project, just a rando on the internet, etc, but -

I am not in favor of this change - specifically, pulling all of the command arguments into a single file. I think that substantially reduces readability and maintainability of the code. The rest I'm fine with, but I think we're taking a serious hit on code legibility to save a second or two of user time.

@tsibley
Copy link
Member

tsibley commented Apr 1, 2020

Thanks for the effort here, @groutr. The startup time gains you've made are great! and I really appreciate them, but I'm not in favor of this PR in its current form. Readability and maintainability of the codebase is a much greater priority than a second or two of startup time, and having each command's argument handling in its own file helps with that a lot. Keep in mind that most augur invocations end up embedded into workflow files and run computations where the startup times we're talking about are inconsequential in comparison. I agree that you do feel the lag when exploring the augur interface interactively, and ideally it wouldn't be there.

Regarding required options: I'm in agreement with @elebow here. It's also worth noting that the interface Augur has went through a lot of thought and consideration by the whole Nextstrain team a bit ago resulting in the current conventions.

I do think splitting out the required options into a "Required parameters" argument group would be great! and I don't expect that to affect compatibility. This group could be showed in help output first and thus highlight the required things, with the default "Optional arguments" group following.

Removed expensive logic from __init__. CLI logic should not reside in this file. This file is executed on every import. All the logic that was here should be __main__.py.

I'm pretty sure you know this already :-), but your statement reads unclear to me, so I want to make sure everyone is on the same page. My understanding, which I just verified, is that augur/__init__.py is only executed once per process, at first import of its package or a subpackage. It won't be executed again on every subsequent import of augur.* packages. I agree it doesn't need to be in the execution path of non-CLI Augur modules, but that can be accomplished with a smaller, less intrusive change.

We also don't need to set a recursion limit anymore.

That recursion limit tweaking has nothing to do with the code you changed. It is used when using Augur to process very unbalanced deep trees and allows recursive functions in other libraries to keep walking down the tree. It needs to be kept.

One dev practice which I think is worth mentioning now, though that I certainly don't expect you to know in advance because it isn't documented: For feature/topic branches, we prefer to incorporate newer changes from master by rebasing it rather than merging master into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve startup time
4 participants