Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

marker stats #10009

Merged
merged 14 commits into from
Oct 29, 2021
Merged

marker stats #10009

merged 14 commits into from
Oct 29, 2021

Conversation

ka-bu
Copy link
Contributor

@ka-bu ka-bu commented Oct 28, 2021

Proposed changes:

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

@ka-bu ka-bu requested a review from a team as a code owner October 28, 2021 17:03
@ka-bu ka-bu requested review from tayfun, usc-m and a team and removed request for a team and tayfun October 28, 2021 17:03
@ka-bu ka-bu assigned aeshky and ka-bu and unassigned aeshky Oct 28, 2021
@ka-bu ka-bu requested review from aeshky and removed request for a team October 28, 2021 17:03
extracted_markers = json.load(json_file)
return extracted_markers
class _CSVWriter:
"""A csv writer."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a CSVWriter wrapper class as it is not doing much at the moment? Why not use csv module directly?

csv_writer = CSVWriter(stream)
csv_writer.writerow(some_text)
# vs
csv_writer = csv.writer(stream)
csv_writer.writerow(some_text_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was meant as a workaround because of the type of that csv_writer, because I pass the csv_writer to some functions later (and hence need typing there). But in the solution above the BaseIO thing is also not right....

Hm... I guess I could replace this attempt by a Protocol. Then I can get rid of the need to type the stream 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :)

num_markers: int = 3,
num_sessions_min: int = 2,
num_sessions_max: int = 10,
) -> Tuple[List[Dict[Text, List[EventMetaData]]], Dict[Text, List[List[int]]]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm new to type annotations, and this is just ghastly to me 😱 (just to be clear, I'm not saying anything about code, just that annotations add a lot of boilerplate code and it kind of beats the advantages of a dynamic language?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree this is ugly. I could give these things some names... but I don't know how much that really helps 🤔 . I'll try to think of something anyway - thanks for the nudge :)

( If we had pandas in our dependencies it would be much simpler 😅 . A lot of what's going on in that code is building a multi-index-column dataframe without a dataframe :D)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On that note, is there a reason why we can't include pandas in our dependencies? Was it something you've already considered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we've searched for that but it isn't included yet :/

Copy link
Contributor

@usc-m usc-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly some comments about docs and just checking my understanding. Looks good!

rasa/core/evaluation/marker_stats.py Outdated Show resolved Hide resolved
rasa/core/evaluation/marker_stats.py Outdated Show resolved Hide resolved
rasa/core/evaluation/marker_stats.py Outdated Show resolved Hide resolved
rasa/core/evaluation/marker_stats.py Outdated Show resolved Hide resolved

def process(
self,
extracted_markers: Dict[Text, List[EventMetaData]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for a single dialogue/session right? Might be good to mention in the docs to disambiguate between functions over dialogues, trackers, and collections of trackers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand the workflow is we feed dialogues into this one at a time, and then the state of this class tracks all of them for later output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point - the extracted_marker and the comment is not so informative -- and you understand it exactly as I meant it :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood it as per-session markers when I saw sender_id and session_idx, but I agree it's better to say explicitly what it does in the docstrings.
Does it make sense to order the arugments as:

  sender_id: Text,
  session_idx: int,
  extracted_markers: Dict[Text, List[EventMetaData]],

I think that would make it clearer (and consistent with the final output).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

def _header() -> List[Text]:
return [
"sender_id",
"session_idx",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we use dialogue_id in the output file, we probably want to be consistent here too - which do you prefer? I can change it in the CLI PR as well to match if that's better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I wasn't thinking of that -- I just remembered Gregs comment and thought this was more precise and that we should change it in marker.py later (can do that in the validate PR) -- yep, I think we should rename 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree that we should keep session as it's more precise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kept as is :)

Co-authored-by: Matthew Summers <m.summers@rasa.com>
Copy link
Contributor

@aeshky aeshky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My brain fried trying to follow the long test, but I think everything looks sound. Left a couple of questions.

@ka-bu ka-bu requested a review from usc-m October 29, 2021 15:31
Copy link
Contributor

@usc-m usc-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉 Looks like it's missing the CLI bits (of course), but otherwise all here

self.num_preceding_user_turns_collected = {
marker_name: [] for marker_name in self._marker_names
}
# NOTE: we could stream / compute them later instead of collecting them...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to ticket this or record the enhancement somewhere outside the codebase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to #9962

@ka-bu ka-bu enabled auto-merge (squash) October 29, 2021 16:05
@ka-bu ka-bu merged commit 52ad480 into main Oct 29, 2021
@ka-bu ka-bu deleted the marker/stats branch October 29, 2021 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Success Markers: cleanup/fix stats
4 participants