-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
marker stats #10009
marker stats #10009
Conversation
rasa/core/evaluation/marker_stats.py
Outdated
extracted_markers = json.load(json_file) | ||
return extracted_markers | ||
class _CSVWriter: | ||
"""A csv writer.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a CSVWriter wrapper class as it is not doing much at the moment? Why not use csv module directly?
csv_writer = CSVWriter(stream)
csv_writer.writerow(some_text)
# vs
csv_writer = csv.writer(stream)
csv_writer.writerow(some_text_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was meant as a workaround because of the type of that csv_writer, because I pass the csv_writer to some functions later (and hence need typing there). But in the solution above the BaseIO thing is also not right....
Hm... I guess I could replace this attempt by a Protocol. Then I can get rid of the need to type the stream 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done :)
num_markers: int = 3, | ||
num_sessions_min: int = 2, | ||
num_sessions_max: int = 10, | ||
) -> Tuple[List[Dict[Text, List[EventMetaData]]], Dict[Text, List[List[int]]]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm new to type annotations, and this is just ghastly to me 😱 (just to be clear, I'm not saying anything about code, just that annotations add a lot of boilerplate code and it kind of beats the advantages of a dynamic language?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree this is ugly. I could give these things some names... but I don't know how much that really helps 🤔 . I'll try to think of something anyway - thanks for the nudge :)
( If we had pandas
in our dependencies it would be much simpler 😅 . A lot of what's going on in that code is building a multi-index-column dataframe without a dataframe :D)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On that note, is there a reason why we can't include pandas in our dependencies? Was it something you've already considered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we've searched for that but it isn't included yet :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly some comments about docs and just checking my understanding. Looks good!
rasa/core/evaluation/marker_stats.py
Outdated
|
||
def process( | ||
self, | ||
extracted_markers: Dict[Text, List[EventMetaData]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for a single dialogue/session right? Might be good to mention in the docs to disambiguate between functions over dialogues, trackers, and collections of trackers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand the workflow is we feed dialogues into this one at a time, and then the state of this class tracks all of them for later output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good point - the extracted_marker
and the comment is not so informative -- and you understand it exactly as I meant it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood it as per-session markers when I saw sender_id
and session_idx
, but I agree it's better to say explicitly what it does in the docstrings.
Does it make sense to order the arugments as:
sender_id: Text,
session_idx: int,
extracted_markers: Dict[Text, List[EventMetaData]],
I think that would make it clearer (and consistent with the final output).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
def _header() -> List[Text]: | ||
return [ | ||
"sender_id", | ||
"session_idx", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we use dialogue_id
in the output file, we probably want to be consistent here too - which do you prefer? I can change it in the CLI PR as well to match if that's better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I wasn't thinking of that -- I just remembered Gregs comment and thought this was more precise and that we should change it in marker.py later (can do that in the validate PR) -- yep, I think we should rename 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree that we should keep session as it's more precise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kept as is :)
Co-authored-by: Matthew Summers <m.summers@rasa.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My brain fried trying to follow the long test, but I think everything looks sound. Left a couple of questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉 Looks like it's missing the CLI bits (of course), but otherwise all here
self.num_preceding_user_turns_collected = { | ||
marker_name: [] for marker_name in self._marker_names | ||
} | ||
# NOTE: we could stream / compute them later instead of collecting them... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to ticket this or record the enhancement somewhere outside the codebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added to #9962
Proposed changes:
Status (please check what you already did):
black
(please check Readme for instructions)