Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Success Markers CLI #9824

Merged
merged 20 commits into from
Nov 1, 2021
Merged

Success Markers CLI #9824

merged 20 commits into from
Nov 1, 2021

Conversation

usc-m
Copy link
Contributor

@usc-m usc-m commented Oct 7, 2021

Proposed changes:

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

@usc-m usc-m requested review from aeshky and TyDunn October 7, 2021 16:51
@usc-m
Copy link
Contributor Author

usc-m commented Oct 7, 2021

So I've reached a bit of an impasse on a few fronts here:

  1. I'm not sure if this is the CLI design we want
  2. I don't have an interface to go against for writing out marker files or computing them from the trackers that I can go against to complete implementation

I've made this PR a draft to facilitate discussion on these points (@TyDunn I've tagged you because you have opinions on CLI design/might be aware of others who do as well) before I proceed further with implementing them. At the moment the sketch of an implementation I have here is pretty much exactly what we have in the implementation proposal, however, there was some discussion of unifying them into a single command, and also discussion around where these should live.

Here's some ideas I've seen floated or had:

Current
This splits extract and stats into two subcommands under markers, extract pulls trackers in from a tracker store specified by the endpoint config (which has its argument handled by the existing default argument stuff for that) according to a strategy (--by - would need a default if it were optional), up to a number --num-trackers (useless if strategy is all). If a seed is set that is used to set up the random number generator (useless if strategy is not sample).
Optionally, stats can be extracted to a file at this stage.

stats takes a file produced by extract and produces a stats file.

Unique properties this has (can be interpreted positively or negatively depending on what our priorities are):

  • Markers has it's own subcommand, quite a prominent feature on its own
    -- Could be useful if we were to expand what could be done with markers
  • Extract and stats are separated

Unanswered questions for this one that I can think of:

  • Are these the correct optional/positional parameters?
  • For the optionals, what should their defaults be?
  • Does this model the most common way of using the feature? (i.e. are people always going to want to be working with files or are they looking for quick stats printed to the terminal?)
  • Should we also print stats to the terminal?
rasa markers extract [--stats <FILENAME>] [--num-trackers <NUM>] [--by <first-n|sample|all>] [--seed <SEED>] [--endpoints <ENDPOINT_CONFIG>] <OUTPUT_FILENAME>

rasa markers stats <INPUT_FILENAME> <OUTPUT_FILENAME>

Variation
One idea I had was to try and roll both commands into one and sit it under an evaluate namespace, as I feel like that's more likely to be useful as a top-level subcommand later. This version assumes we want to print stats by default and moves file exporting to an option for both extraction and summary statistics. Some problems I've encountered with this though are:

  • Not fully clear what the default invocation would be
  • Still no clarity on default values of parameters
  • Not sure if this models the intended workflow better, but feels a bit cleaner to me
rasa evaluate markers [--extract <FILENAME>] [--stats-file <FILENAME>] [--endpoints <ENDPOINT_CONFIG>]  [--num-trackers <NUM>] [--by <first-n|sample|all>] [--no-stats]

As for the issue of interfaces, we don't yet have a central markers class/data structure which I can use to implement functions to write out/read in markers from disk, or write out stats. This is proving to be a bit of a blocker for this ticket and the other ticket I have at the moment (#9661), since I'm not sure how to access marker data in memory.

We have samples which I can derive a schema of the on-disk representation from the PR closing #9662, but I don't want to duplicate effort which would otherwise be done in #9660. If we can define some of these interfaces now that would unblock me and we can somewhat parallelise the work (though the merge order will probably be weird, and it might be difficult to test this PR until #9660 is in).

@usc-m
Copy link
Contributor Author

usc-m commented Oct 14, 2021

After discussing with the team and voting on options, we've decided on rasa evaluate markers as the overall command. The extract and stats components will be rolled into one, with the default being to extract and produce stats (and stats can be disabled with a flag). This would look roughly like:

rasa evaluate markers [--no-stats] [--endpoints <ENDPOINTS_CONFIG>] <by_first_n|by_sample|all> [-n <NUM_TRACKERS>] [--seed <SEED>] <OUTPUT_FILENAME>

Extract would extract from the tracker store specified, using the strategy specified (-n and --seed will be put into a mutually exclusive group depending on the strategy), and save the results into OUTPUT_FILENAME (e.g. markers.json). Stats would be saved into a modified version of the output filename (e.g. markers.stats.json), unless disabled by the flag

@usc-m usc-m force-pushed the 9658/success-markers-cli branch from 15041ec to 165082d Compare October 27, 2021 16:58
Copy link
Contributor

@ka-bu ka-bu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

rasa/core/evaluation/marker_base.py Outdated Show resolved Hide resolved
rasa/cli/evaluate.py Outdated Show resolved Hide resolved
rasa/cli/arguments/evaluate.py Outdated Show resolved Hide resolved
rasa/cli/evaluate.py Outdated Show resolved Hide resolved
@usc-m usc-m requested a review from joejuzl October 28, 2021 10:18
@usc-m usc-m marked this pull request as ready for review October 28, 2021 10:18
@usc-m usc-m requested a review from a team as a code owner October 28, 2021 10:18
Copy link
Contributor

@joejuzl joejuzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Cool to see it all coming together :)

default="markers.yml",
type=str,
help="The config file(s) containing marker definitions. This can be a single "
"YAML file, or a directory that contains several files with domain "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "domain" mean here? Like domain-specific configs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I just copied this over from the domain version of this absentmindedly, fixing it to be "marker definitions"


add_endpoint_param(
parser,
help_text="Configuration file for the model server and the connectors as a "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the key thing worth mentioning here is that it contains the configuration for connecting to the tracker store right?

"--seed", type=int, help="Seed to use if selecting trackers by 'sample'"
)
parser.add_argument(
"count", type=int, help="The number of trackers to extract markers from",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be different from the "count" in set_markers_first_n_arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly but it arises because of sub-parsers (already commented on them on another comment)


markers_subparser = marker_parser.add_subparsers(dest="strategy")

markers_first_n_subparser = markers_subparser.add_parser(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be done with argument i.e. with add_mutually_exclusive_group? Feels a bit weird doing it with subparsers...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check again, I looked into how rasa test does it and it looks like it uses sub-parsers (though I agree it's a bit odd). I think I could do it with add_mutually_exclusive_group and choices though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I had a look, and I'm not sure that's possible easily - add_mutually_exclusive_group lets us force one of the args in the group to be picked, but we have a situation where one option has 2 args that are allowed, one has one arg, and one has none. Those are all dependent on the value of strategy (which we could force with choices), but I'm not sure we can do mutual exclusion another way without the use of sub-parsers here (which I agree looks quite nasty)

Comment on lines 95 to 96
seed: int,
count: int,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at _run_markers_cli these can be None? (Optional)

processed_trackers = {}

for tracker in tracker_loader.load():
tracker_result = markers.evaluate_events(tracker.events)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason we don't pass a list of trackers in so all the logic is encapsulated in markers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't want to have a function that just loops but back then I didn't think of writing results .... so not really, we could (think you're right, we should) create a function in markers.py that gets an iterator over markers and a filestream and do the work there (plus a just a path to the statsfile)

return processed_trackers


def _save_results(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels to me like this logic shouldn't be in the cli package. Is there a reason we don't have it in rasa.core.evaluation.marker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we were torn - at first it was going to just be json.dumps into a file and it seemed too trivial to worry about, but then we changed the format and now it probably should live with the marker package. I'll move it

@@ -638,6 +638,16 @@ def write_endpoint_config_to_yaml(
return endpoints_path


def write_markers_config_to_yaml(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only used in one test file it can probably live there.

@usc-m usc-m force-pushed the 9658/success-markers-cli branch from 165082d to 1d45b90 Compare October 28, 2021 16:24
@usc-m usc-m requested review from joejuzl and ka-bu October 28, 2021 16:30
@usc-m usc-m changed the title (WIP) Success Markers CLI Success Markers CLI Oct 28, 2021
Copy link
Contributor

@ka-bu ka-bu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good :) Just some tiny suggestions on the tests

rasa/core/evaluation/marker_base.py Outdated Show resolved Hide resolved
)

@staticmethod
def _write_dialogue(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_write_relevant_event_information ?

@@ -485,6 +488,86 @@ def from_config(config: MarkerConfig, name: Optional[Text] = None) -> Marker:
marker.name = name
return marker

def export_markers(
self,
tracker_loader: MarkerTrackerLoader,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about...

Suggested change
tracker_loader: MarkerTrackerLoader,
tracker_loader: Iterator[Optional[DialogueStateTracker]]:

and then we just pass tracker_loader.load() (which for some reason has the "optional", so we also need to check for tracker == None below, don't we?) -- which makes it a tiny bit easier to add an isolated test for this :) (and no worries, I think we can just reuse/move the test you already wrote)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good catch on the optional - I remember why I did that but I forgot about it. It has to be optional because retrieve on the tracker store interface returns an optional (which I assume is to handle things like your database falls over and your request times out, or something has been deleted in between you requesting the list of keys and requesting the value for a specific key)

None, 10, endpoints_path, "first_n", markers_path, results_path, None
)

with open(results_path, "r") as results:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have the new export now inside Marker along with all the logic of what is exported -- How about we just check here that the file can be loaded with the csv reader, and move the below (and copy the above example, without the tracker loader/store to a test_export in the marker tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I'm not sure how we check it can be loaded with the CSV reader because I'm pretty sure it processes the CSV lazily - so we can't know it's a valid CSV without checking the whole file. I guess just iterate all rows and assert that it doesn't raise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I would just iterate over it 👍

Co-authored-by: Kathrin Bujna <kathrin.bujna@gmail.com>
Copy link
Contributor

@aeshky aeshky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work 🚀
I think export_markers should be unit tested in the test_marker.py as kathrin suggested.

@@ -485,6 +488,86 @@ def from_config(config: MarkerConfig, name: Optional[Text] = None) -> Marker:
marker.name = name
return marker

def export_markers(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add explicit unit tests for these new functions outside of the CLI tests?

tracker_result = self.evaluate_events(tracker.events)
processed_trackers[tracker.sender_id] = tracker_result

Marker._save_results(output_file, processed_trackers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get various type warnings in pycharm in this file. Should we fix these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 working on those as part of the config validation task -- we can ignore them for now

Copy link
Contributor

@joejuzl joejuzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@usc-m usc-m requested review from aeshky and ka-bu October 29, 2021 15:25
Copy link
Contributor

@ka-bu ka-bu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

@usc-m usc-m enabled auto-merge (squash) October 29, 2021 15:32
aeshky and others added 3 commits November 1, 2021 10:55
Had to move WriteRow protocol from stats to utils since it was needed in more than one place and direct import caused circularity. Had to also address linting errors
Protocol doesn't exist in Py3.7
@usc-m usc-m merged commit 18ca40c into main Nov 1, 2021
@usc-m usc-m deleted the 9658/success-markers-cli branch November 1, 2021 13:27
@aeshky
Copy link
Contributor

aeshky commented Nov 1, 2021

🎉

tayfun pushed a commit that referenced this pull request Nov 1, 2021
* Create stubs to start filling out Aciel's snippet

Co-authored-by: Aciel Eshky <a.eshky@rasa.com>

* Sketch out basics of CLI arguments

* Start changing over args to new format

* Make CLI match description from design pass

* Add tests

* Switch over timestamps to event IDs

* Fix tests now event IDs are supported

* Start work on doc comments

* Switch over event_id fix

Also removes otherwise-unused import to stop circularity

* Address PR comments

* Fix incorrect file extension

* Codeclimate

* Update rasa/core/evaluation/marker_base.py

Co-authored-by: Kathrin Bujna <kathrin.bujna@gmail.com>

* Address PR comments

* Move test to marker

* Rename dialogue to session

Co-authored-by: Aciel Eshky <a.eshky@rasa.com>
Co-authored-by: Kathrin Bujna <kathrin.bujna@gmail.com>
indam23 pushed a commit that referenced this pull request Jul 27, 2022
* Create stubs to start filling out Aciel's snippet

Co-authored-by: Aciel Eshky <a.eshky@rasa.com>

* Sketch out basics of CLI arguments

* Start changing over args to new format

* Make CLI match description from design pass

* Add tests

* Switch over timestamps to event IDs

* Fix tests now event IDs are supported

* Start work on doc comments

* Switch over event_id fix

Also removes otherwise-unused import to stop circularity

* Address PR comments

* Fix incorrect file extension

* Codeclimate

* Update rasa/core/evaluation/marker_base.py

Co-authored-by: Kathrin Bujna <kathrin.bujna@gmail.com>

* Address PR comments

* Move test to marker

* Rename dialogue to session

Co-authored-by: Aciel Eshky <a.eshky@rasa.com>
Co-authored-by: Kathrin Bujna <kathrin.bujna@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Success markers: create new markers CLI
4 participants