marker stats #10009

ka-bu · 2021-10-28T17:03:15Z

Proposed changes:

closes Success Markers: cleanup/fix stats #9961

Status (please check what you already did):

added some tests for the functionality
updated the documentation
updated the changelog (please check changelog for instructions)
reformat files using black (please check Readme for instructions)

…rker/stats

tayfun · 2021-10-29T09:36:54Z

rasa/core/evaluation/marker_stats.py

-        extracted_markers = json.load(json_file)
-        return extracted_markers
+class _CSVWriter:
+    """A csv writer."""


Do we need a CSVWriter wrapper class as it is not doing much at the moment? Why not use csv module directly?

csv_writer = CSVWriter(stream) csv_writer.writerow(some_text) # vs csv_writer = csv.writer(stream) csv_writer.writerow(some_text_

It was meant as a workaround because of the type of that csv_writer, because I pass the csv_writer to some functions later (and hence need typing there). But in the solution above the BaseIO thing is also not right....

Hm... I guess I could replace this attempt by a Protocol. Then I can get rid of the need to type the stream 🤔

tayfun · 2021-10-29T09:39:58Z

tests/core/evaluation/test_marker_stats.py

+    num_markers: int = 3,
+    num_sessions_min: int = 2,
+    num_sessions_max: int = 10,
+) -> Tuple[List[Dict[Text, List[EventMetaData]]], Dict[Text, List[List[int]]]]:


I'm new to type annotations, and this is just ghastly to me 😱 (just to be clear, I'm not saying anything about code, just that annotations add a lot of boilerplate code and it kind of beats the advantages of a dynamic language?)

Yes, I agree this is ugly. I could give these things some names... but I don't know how much that really helps 🤔 . I'll try to think of something anyway - thanks for the nudge :)

( If we had pandas in our dependencies it would be much simpler 😅 . A lot of what's going on in that code is building a multi-index-column dataframe without a dataframe :D)

On that note, is there a reason why we can't include pandas in our dependencies? Was it something you've already considered?

Yeah we've searched for that but it isn't included yet :/

usc-m

Mostly some comments about docs and just checking my understanding. Looks good!

rasa/core/evaluation/marker_stats.py

usc-m · 2021-10-29T10:16:10Z

rasa/core/evaluation/marker_stats.py

+
+    def process(
+        self,
+        extracted_markers: Dict[Text, List[EventMetaData]],


This is for a single dialogue/session right? Might be good to mention in the docs to disambiguate between functions over dialogues, trackers, and collections of trackers

From what I understand the workflow is we feed dialogues into this one at a time, and then the state of this class tracks all of them for later output?

Oh good point - the extracted_marker and the comment is not so informative -- and you understand it exactly as I meant it :)

I understood it as per-session markers when I saw sender_id and session_idx, but I agree it's better to say explicitly what it does in the docstrings.
Does it make sense to order the arugments as:

sender_id: Text, session_idx: int, extracted_markers: Dict[Text, List[EventMetaData]],

I think that would make it clearer (and consistent with the final output).

usc-m · 2021-10-29T10:18:29Z

rasa/core/evaluation/marker_stats.py

+    def _header() -> List[Text]:
+        return [
+            "sender_id",
+            "session_idx",


Since we use dialogue_id in the output file, we probably want to be consistent here too - which do you prefer? I can change it in the CLI PR as well to match if that's better

oh, I wasn't thinking of that -- I just remembered Gregs comment and thought this was more precise and that we should change it in marker.py later (can do that in the validate PR) -- yep, I think we should rename 👍

Yeah I agree that we should keep session as it's more precise.

kept as is :)

Co-authored-by: Matthew Summers <m.summers@rasa.com>

aeshky

My brain fried trying to follow the long test, but I think everything looks sound. Left a couple of questions.

usc-m

LGTM 🎉 Looks like it's missing the CLI bits (of course), but otherwise all here

usc-m · 2021-10-29T15:56:34Z

rasa/core/evaluation/marker_stats.py

+            self.num_preceding_user_turns_collected = {
+                marker_name: [] for marker_name in self._marker_names
+            }
+            # NOTE: we could stream / compute them later instead of collecting them...


Might want to ticket this or record the enhancement somewhere outside the codebase

added to #9962

marker stats (flat format)

3ce64d9

ka-bu requested a review from a team as a code owner October 28, 2021 17:03

ka-bu requested review from tayfun, usc-m and a team and removed request for a team and tayfun October 28, 2021 17:03

ka-bu assigned aeshky and ka-bu and unassigned aeshky Oct 28, 2021

ka-bu requested review from aeshky and removed request for a team October 28, 2021 17:03

ka-bu added 7 commits October 29, 2021 09:14

ccodeclimate; dialogue->session; add macro avg

707049b

Merge branch 'main' into marker/stats

475d222

codeclimate

147edfe

Merge branch 'marker/stats' of https://github.com/RasaHQ/rasa into ma…

245e691

…rker/stats

minor (rm unused import; add missing param)

12ed136

minor (rm breakpoint)

4789e10

small fixes (types, forgotten assert)

809e2b4

tayfun reviewed Oct 29, 2021

View reviewed changes

usc-m reviewed Oct 29, 2021

View reviewed changes

Apply suggestions from code review

95fe5b9

Co-authored-by: Matthew Summers <m.summers@rasa.com>

aeshky approved these changes Oct 29, 2021

View reviewed changes

ka-bu added 3 commits October 29, 2021 17:09

wrapper -> protocol; clarify types in tests; adapt process interface

14f914f

separate writing of session and overall stats

a6c200e

Merge branch 'main' into marker/stats

d5cf071

ka-bu requested a review from usc-m October 29, 2021 15:31

usc-m approved these changes Oct 29, 2021

View reviewed changes

ka-bu enabled auto-merge (squash) October 29, 2021 16:05

ka-bu added 2 commits October 29, 2021 19:06

typing.Protocol -> typing_extensions.Protocol

40f8f22

lint

e8df3d8

ka-bu merged commit 52ad480 into main Oct 29, 2021

ka-bu deleted the marker/stats branch October 29, 2021 18:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

marker stats #10009

marker stats #10009

ka-bu commented Oct 28, 2021

tayfun Oct 29, 2021

ka-bu Oct 29, 2021

ka-bu Oct 29, 2021

tayfun Oct 29, 2021

ka-bu Oct 29, 2021

aeshky Oct 29, 2021

ka-bu Oct 29, 2021

usc-m left a comment

usc-m Oct 29, 2021

usc-m Oct 29, 2021

ka-bu Oct 29, 2021

aeshky Oct 29, 2021

ka-bu Oct 29, 2021

usc-m Oct 29, 2021

ka-bu Oct 29, 2021

aeshky Oct 29, 2021

ka-bu Oct 29, 2021

aeshky left a comment

usc-m left a comment •

edited

Loading

usc-m Oct 29, 2021

ka-bu Oct 29, 2021

marker stats #10009

marker stats #10009

Conversation

ka-bu commented Oct 28, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

usc-m left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aeshky left a comment

Choose a reason for hiding this comment

usc-m left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

usc-m left a comment •

edited

Loading