-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Success Markers CLI #9824
Success Markers CLI #9824
Conversation
So I've reached a bit of an impasse on a few fronts here:
I've made this PR a draft to facilitate discussion on these points (@TyDunn I've tagged you because you have opinions on CLI design/might be aware of others who do as well) before I proceed further with implementing them. At the moment the sketch of an implementation I have here is pretty much exactly what we have in the implementation proposal, however, there was some discussion of unifying them into a single command, and also discussion around where these should live. Here's some ideas I've seen floated or had: Current
Unique properties this has (can be interpreted positively or negatively depending on what our priorities are):
Unanswered questions for this one that I can think of:
Variation
As for the issue of interfaces, we don't yet have a central markers class/data structure which I can use to implement functions to write out/read in markers from disk, or write out stats. This is proving to be a bit of a blocker for this ticket and the other ticket I have at the moment (#9661), since I'm not sure how to access marker data in memory. We have samples which I can derive a schema of the on-disk representation from the PR closing #9662, but I don't want to duplicate effort which would otherwise be done in #9660. If we can define some of these interfaces now that would unblock me and we can somewhat parallelise the work (though the merge order will probably be weird, and it might be difficult to test this PR until #9660 is in). |
After discussing with the team and voting on options, we've decided on
Extract would extract from the tracker store specified, using the strategy specified (-n and --seed will be put into a mutually exclusive group depending on the strategy), and save the results into OUTPUT_FILENAME (e.g. markers.json). Stats would be saved into a modified version of the output filename (e.g. markers.stats.json), unless disabled by the flag |
15041ec
to
165082d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 Cool to see it all coming together :)
rasa/cli/arguments/evaluate.py
Outdated
default="markers.yml", | ||
type=str, | ||
help="The config file(s) containing marker definitions. This can be a single " | ||
"YAML file, or a directory that contains several files with domain " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does "domain" mean here? Like domain-specific configs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I just copied this over from the domain
version of this absentmindedly, fixing it to be "marker definitions"
rasa/cli/arguments/evaluate.py
Outdated
|
||
add_endpoint_param( | ||
parser, | ||
help_text="Configuration file for the model server and the connectors as a " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the key thing worth mentioning here is that it contains the configuration for connecting to the tracker store right?
"--seed", type=int, help="Seed to use if selecting trackers by 'sample'" | ||
) | ||
parser.add_argument( | ||
"count", type=int, help="The number of trackers to extract markers from", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be different from the "count" in set_markers_first_n_arguments
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not strictly but it arises because of sub-parsers (already commented on them on another comment)
|
||
markers_subparser = marker_parser.add_subparsers(dest="strategy") | ||
|
||
markers_first_n_subparser = markers_subparser.add_parser( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be done with argument i.e. with add_mutually_exclusive_group
? Feels a bit weird doing it with subparsers...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check again, I looked into how rasa test
does it and it looks like it uses sub-parsers (though I agree it's a bit odd). I think I could do it with add_mutually_exclusive_group
and choices though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I had a look, and I'm not sure that's possible easily - add_mutually_exclusive_group
lets us force one of the args in the group to be picked, but we have a situation where one option has 2 args that are allowed, one has one arg, and one has none. Those are all dependent on the value of strategy
(which we could force with choices
), but I'm not sure we can do mutual exclusion another way without the use of sub-parsers here (which I agree looks quite nasty)
rasa/cli/evaluate.py
Outdated
seed: int, | ||
count: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at _run_markers_cli
these can be None
? (Optional
)
rasa/cli/evaluate.py
Outdated
processed_trackers = {} | ||
|
||
for tracker in tracker_loader.load(): | ||
tracker_result = markers.evaluate_events(tracker.events) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a reason we don't pass a list of trackers in so all the logic is encapsulated in markers
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't want to have a function that just loops but back then I didn't think of writing results .... so not really, we could (think you're right, we should) create a function in markers.py that gets an iterator over markers and a filestream and do the work there (plus a just a path to the statsfile)
rasa/cli/evaluate.py
Outdated
return processed_trackers | ||
|
||
|
||
def _save_results( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels to me like this logic shouldn't be in the cli
package. Is there a reason we don't have it in rasa.core.evaluation.marker
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we were torn - at first it was going to just be json.dumps
into a file and it seemed too trivial to worry about, but then we changed the format and now it probably should live with the marker
package. I'll move it
tests/conftest.py
Outdated
@@ -638,6 +638,16 @@ def write_endpoint_config_to_yaml( | |||
return endpoints_path | |||
|
|||
|
|||
def write_markers_config_to_yaml( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is only used in one test file it can probably live there.
Co-authored-by: Aciel Eshky <a.eshky@rasa.com>
Also removes otherwise-unused import to stop circularity
165082d
to
1d45b90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good :) Just some tiny suggestions on the tests
rasa/core/evaluation/marker_base.py
Outdated
) | ||
|
||
@staticmethod | ||
def _write_dialogue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_write_relevant_event_information ?
rasa/core/evaluation/marker_base.py
Outdated
@@ -485,6 +488,86 @@ def from_config(config: MarkerConfig, name: Optional[Text] = None) -> Marker: | |||
marker.name = name | |||
return marker | |||
|
|||
def export_markers( | |||
self, | |||
tracker_loader: MarkerTrackerLoader, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about...
tracker_loader: MarkerTrackerLoader, | |
tracker_loader: Iterator[Optional[DialogueStateTracker]]: |
and then we just pass tracker_loader.load() (which for some reason has the "optional", so we also need to check for tracker == None below, don't we?) -- which makes it a tiny bit easier to add an isolated test for this :) (and no worries, I think we can just reuse/move the test you already wrote)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good catch on the optional - I remember why I did that but I forgot about it. It has to be optional because retrieve
on the tracker store interface returns an optional (which I assume is to handle things like your database falls over and your request times out, or something has been deleted in between you requesting the list of keys and requesting the value for a specific key)
None, 10, endpoints_path, "first_n", markers_path, results_path, None | ||
) | ||
|
||
with open(results_path, "r") as results: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have the new export now inside Marker along with all the logic of what is exported -- How about we just check here that the file can be loaded with the csv reader, and move the below (and copy the above example, without the tracker loader/store to a test_export
in the marker tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I'm not sure how we check it can be loaded with the CSV reader because I'm pretty sure it processes the CSV lazily - so we can't know it's a valid CSV without checking the whole file. I guess just iterate all rows and assert that it doesn't raise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, I would just iterate over it 👍
Co-authored-by: Kathrin Bujna <kathrin.bujna@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work 🚀
I think export_markers
should be unit tested in the test_marker.py
as kathrin suggested.
@@ -485,6 +488,86 @@ def from_config(config: MarkerConfig, name: Optional[Text] = None) -> Marker: | |||
marker.name = name | |||
return marker | |||
|
|||
def export_markers( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add explicit unit tests for these new functions outside of the CLI tests?
tracker_result = self.evaluate_events(tracker.events) | ||
processed_trackers[tracker.sender_id] = tracker_result | ||
|
||
Marker._save_results(output_file, processed_trackers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get various type warnings in pycharm in this file. Should we fix these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 working on those as part of the config validation task -- we can ignore them for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥳
Had to move WriteRow protocol from stats to utils since it was needed in more than one place and direct import caused circularity. Had to also address linting errors
Protocol doesn't exist in Py3.7
🎉 |
* Create stubs to start filling out Aciel's snippet Co-authored-by: Aciel Eshky <a.eshky@rasa.com> * Sketch out basics of CLI arguments * Start changing over args to new format * Make CLI match description from design pass * Add tests * Switch over timestamps to event IDs * Fix tests now event IDs are supported * Start work on doc comments * Switch over event_id fix Also removes otherwise-unused import to stop circularity * Address PR comments * Fix incorrect file extension * Codeclimate * Update rasa/core/evaluation/marker_base.py Co-authored-by: Kathrin Bujna <kathrin.bujna@gmail.com> * Address PR comments * Move test to marker * Rename dialogue to session Co-authored-by: Aciel Eshky <a.eshky@rasa.com> Co-authored-by: Kathrin Bujna <kathrin.bujna@gmail.com>
* Create stubs to start filling out Aciel's snippet Co-authored-by: Aciel Eshky <a.eshky@rasa.com> * Sketch out basics of CLI arguments * Start changing over args to new format * Make CLI match description from design pass * Add tests * Switch over timestamps to event IDs * Fix tests now event IDs are supported * Start work on doc comments * Switch over event_id fix Also removes otherwise-unused import to stop circularity * Address PR comments * Fix incorrect file extension * Codeclimate * Update rasa/core/evaluation/marker_base.py Co-authored-by: Kathrin Bujna <kathrin.bujna@gmail.com> * Address PR comments * Move test to marker * Rename dialogue to session Co-authored-by: Aciel Eshky <a.eshky@rasa.com> Co-authored-by: Kathrin Bujna <kathrin.bujna@gmail.com>
Proposed changes:
Status (please check what you already did):
black
(please check Readme for instructions)