markers / validation #10012

ka-bu · 2021-10-29T14:34:20Z

Proposed changes:

closes Success Markers: validate config #9964

Overview

"atomic/compound" vs "condition/operator"
- renamed since I guess we will call it the latter in docs and the code was only half-way consistent anyway 😅
tag vs name:
- often, "marker_name" was used to describe a "tag" that must be of a predefined set, while there is also a "name" parameter that can actually set to something arbitrary --> renamed "marker_name" to "tag" where appropriate
- since "tag" can now be a positive or negative one --> renamed "tag()" to "positive_tag()" since it returns just that
expected number of sub-markers:
- I've split the not marker into it's own class since otherwise the expected number of sub-markers would've depended on whether the marker was negated or not .... which I found a bit too confusing
- tests are adapted accordingly already + a test is added to check whether we raise an error if a certain number of markers is expected but one more is passed
config_dict vs from_path
- the "from_config_dict" is gone because we would not have been able to raise errors that point you to the right file where you have to search for the error -> two step approach now that first reads the yamls, checks the top level (custom marker names) and retains the links to the files
config format:
- no listing of text parameters under a single condition tag anymore
- no implicit and marker under a top-level custom marker name anymore

TODO

tests and hopefully not many fixes :D
check against domain

Status (please check what you already did):

added some tests for the functionality
updated the documentation
updated the changelog (please check changelog for instructions)
reformat files using black (please check Readme for instructions)

usc-m · 2021-10-29T14:39:37Z

rasa/core/evaluation/marker_base.py

        return marker


-class CompoundMarker(Marker, ABC):
+class OperatorMarker(Marker, ABC):


👍 prefer this naming, gives us scope to break out validation into true Binary/Unary(/n-ary???) operators down the line by subclassing and makes it clearer what's going on

usc-m · 2021-10-29T15:45:07Z

rasa/core/evaluation/marker_base.py

        """
-        if name == Marker.ANY_MARKER:
+        if name == Marker.ANY_MARKER or name in MarkerRegistry.all_tags:
            raise InvalidMarkerConfig(
                f"You must not use the special marker name {Marker.ANY_MARKER}. "


Need to modify this in the case they used a reserved name in the registry rather than ANY_MARKER

think there was another case where I wanted to add the name == Marker.ANY_MARKER to some name in MarkerRegistry.all_tag check but forgot

usc-m · 2021-10-29T15:47:13Z

rasa/core/evaluation/marker_base.py

-        return combined_configs
-
-    @staticmethod
-    def from_config_dict(config: Dict[Text, MarkerConfig]) -> Marker:


Think this will break one of the CLI tests (specifically the one testing the export_marker function on this class)

yep - we lose that but we win nicer debug messages

usc-m · 2021-10-29T15:50:30Z

If domain checking needs done that needs hooked up to the CLI as well (the CLI parameter exists but is unused).

ka-bu · 2021-10-29T15:59:13Z

@aeshky please don't push commits to this branch without asking @usc-m , who has started writing tests already (besides I think there might have been a better place for the deleting those fixtures) ... @usc-m will be back on Tuesday morning and have no meetings so I can review right away :)

Need to work out why set union over a generator expression wasn't working

ka-bu · 2021-11-02T08:56:59Z

tests/core/evaluation/test_marker.py

-def test_all_operators_in_schema():
-    operators_in_schema = rasa.shared.utils.schemas.markers.OPERATOR_SCHEMA["enum"]
-    operators_in_schema = {tag.lower() for tag in operators_in_schema}
+def collect_slots(marker: Marker) -> Set[Text]:


we can merge these 3 functions into one :)

ka-bu · 2021-11-02T08:57:45Z

rasa/cli/evaluate.py

    markers = Marker.from_path(config)
+
+    if not markers.validate_against_domain(domain):


nice idea to just "bubble up" the warnings and then decide to break it here to gather all warnings first

I was hoping to do it more like a compiler where you can track line number and also collect warnings a bit more elegantly, but it seemed like adding that would involve a lot of complicated stuff around adding information about where markers were parsed from to the markers and I didn't want to do too much in the time we have left

aeshky · 2021-11-02T11:47:19Z

rasa/core/evaluation/marker.py

+        if not valid:
+            rasa.shared.utils.io.raise_warning(
+                f"Referenced action '{self.text}' does not exist in the domain"
+            )


@usc-m and @ka-bu I was creating the QA tests and was curious why this (and the same for intent and slot) is a warning and not an error. Does this mean that the extraction process will continue if there is a mismatch with the domain? What's the advantage of allowing this?

The extraction process isn't managed here - this goes over each marker and decides if they are valid, and if not emits a message, and then passes back True/False if it's valid or not. In the CLI we catch that boolean and error and exit based on it, but in these I wanted a way to collect all of the issues without stopping first, and then stop later (since reporting one at a time would be frustrating for users). The reason it's a warning is mostly because that was my first pass on it. Ideally it would go out as an error without terminating (and then get caught in the CLI to wrap it up and actually terminate) but we don't have a helper function for that so it's a little more involved - definitely something I want to fix before this merges.

The CLI also terminates before the extraction process begins. It loads the marker definitions and the domain, does the validation first, and terminates if issues are found. It will only establish a connection to the tracker store (and by extension perform extraction) if there are no issues

or in short: #10012 (comment) :D

Looked at how DialogueStateTracker manages this and have replicated

I like that it's consistent but that approach is also a bit confusing, isn't it? I would've expected a "raise..." if there's a logger error (and usually, I would not explicitly log the error but just have a decorator to pass on the error message)

... on the other hand, getting an error as a result of "only warnings" is also a bit unexpected maybe 😅

Yeah, I figured it was consistent from the perspective of the user, but it might be worth re-evaluating how we do error management across the codebase at some point

* validation; name vs tag; specify expected number of sub-markers;... * Delete old Marker test fixtures and associated files. * Fix tests after rebasing main * First pass on domain validation * Add some tests around domain validation Need to work out why set union over a generator expression wasn't working * Remove unused markers test data * Config validation tests * Fix lint * add tests for from_path; simplify other tests * make domain path optional for _run_markers * Codeclimate fixes * Codeclimate fixes * Swap over warnings for errors Co-authored-by: aeshky <aciel.eshky@gmail.com> Co-authored-by: Matthew Summers <m.summers@rasa.com>

ka-bu changed the title ~~validation; name vs tag; specify expected number of sub-markers;...~~ markers / validation Oct 29, 2021

ka-bu requested review from usc-m and aeshky October 29, 2021 14:34

usc-m reviewed Oct 29, 2021

View reviewed changes

ka-bu and others added 3 commits November 1, 2021 14:04

validation; name vs tag; specify expected number of sub-markers;...

3404d26

Delete old Marker test fixtures and associated files.

2c31a5e

Fix tests after rebasing main

41a8ad4

usc-m force-pushed the markers/validate branch from 656e284 to 41a8ad4 Compare November 1, 2021 14:12

usc-m added 5 commits November 1, 2021 14:50

First pass on domain validation

95f4c83

Add some tests around domain validation

a98e985

Need to work out why set union over a generator expression wasn't working

Remove unused markers test data

4231a41

Config validation tests

3e3ce24

Fix lint

ea3b0b3

ka-bu marked this pull request as ready for review November 2, 2021 08:18

ka-bu requested a review from a team as a code owner November 2, 2021 08:18

ka-bu removed the request for review from a team November 2, 2021 08:18

ka-bu commented Nov 2, 2021

View reviewed changes

ka-bu added 2 commits November 2, 2021 12:13

add tests for from_path; simplify other tests

f33b8c4

make domain path optional for _run_markers

9a7ef5e

aeshky reviewed Nov 2, 2021

View reviewed changes

usc-m added 4 commits November 2, 2021 13:48

Codeclimate fixes

32e3a4f

Codeclimate fixes

663d651

Merge branch 'main' into markers/validate

c3d972e

Swap over warnings for errors

730689a

ka-bu requested a review from usc-m November 2, 2021 15:24

usc-m approved these changes Nov 2, 2021

View reviewed changes

ka-bu merged commit 9142bd4 into main Nov 2, 2021

ka-bu deleted the markers/validate branch November 2, 2021 15:32

ka-bu mentioned this pull request Nov 2, 2021

Success Markers: adapt YAML syntax #9997

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

markers / validation #10012

markers / validation #10012

ka-bu commented Oct 29, 2021 •

edited

Loading

usc-m Oct 29, 2021

usc-m Oct 29, 2021

ka-bu Oct 29, 2021

usc-m Oct 29, 2021

ka-bu Oct 29, 2021

usc-m commented Oct 29, 2021

ka-bu commented Oct 29, 2021

ka-bu Nov 2, 2021

ka-bu Nov 2, 2021

usc-m Nov 2, 2021

aeshky Nov 2, 2021

usc-m Nov 2, 2021

usc-m Nov 2, 2021

ka-bu Nov 2, 2021

usc-m Nov 2, 2021

ka-bu Nov 2, 2021

ka-bu Nov 2, 2021

usc-m Nov 2, 2021

		markers = Marker.from_path(config)

		if not markers.validate_against_domain(domain):

markers / validation #10012

markers / validation #10012

Conversation

ka-bu commented Oct 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

usc-m commented Oct 29, 2021

ka-bu commented Oct 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ka-bu commented Oct 29, 2021 •

edited

Loading