Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

markers / validation #10012

Merged
merged 14 commits into from
Nov 2, 2021
Merged

markers / validation #10012

merged 14 commits into from
Nov 2, 2021

Conversation

ka-bu
Copy link
Contributor

@ka-bu ka-bu commented Oct 29, 2021

Proposed changes:

Overview

  • "atomic/compound" vs "condition/operator"
    • renamed since I guess we will call it the latter in docs and the code was only half-way consistent anyway 😅
  • tag vs name:
    • often, "marker_name" was used to describe a "tag" that must be of a predefined set, while there is also a "name" parameter that can actually set to something arbitrary --> renamed "marker_name" to "tag" where appropriate
    • since "tag" can now be a positive or negative one --> renamed "tag()" to "positive_tag()" since it returns just that
  • expected number of sub-markers:
    • I've split the not marker into it's own class since otherwise the expected number of sub-markers would've depended on whether the marker was negated or not .... which I found a bit too confusing
    • tests are adapted accordingly already + a test is added to check whether we raise an error if a certain number of markers is expected but one more is passed
  • config_dict vs from_path
    • the "from_config_dict" is gone because we would not have been able to raise errors that point you to the right file where you have to search for the error -> two step approach now that first reads the yamls, checks the top level (custom marker names) and retains the links to the files
  • config format:
    • no listing of text parameters under a single condition tag anymore
    • no implicit and marker under a top-level custom marker name anymore

TODO

  • tests and hopefully not many fixes :D
  • check against domain

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

@ka-bu ka-bu changed the title validation; name vs tag; specify expected number of sub-markers;... markers / validation Oct 29, 2021
@ka-bu ka-bu requested review from usc-m and aeshky October 29, 2021 14:34
return marker


class CompoundMarker(Marker, ABC):
class OperatorMarker(Marker, ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 prefer this naming, gives us scope to break out validation into true Binary/Unary(/n-ary???) operators down the line by subclassing and makes it clearer what's going on

"""
if name == Marker.ANY_MARKER:
if name == Marker.ANY_MARKER or name in MarkerRegistry.all_tags:
raise InvalidMarkerConfig(
f"You must not use the special marker name {Marker.ANY_MARKER}. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to modify this in the case they used a reserved name in the registry rather than ANY_MARKER

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think there was another case where I wanted to add the name == Marker.ANY_MARKER to some name in MarkerRegistry.all_tag check but forgot

return combined_configs

@staticmethod
def from_config_dict(config: Dict[Text, MarkerConfig]) -> Marker:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this will break one of the CLI tests (specifically the one testing the export_marker function on this class)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep - we lose that but we win nicer debug messages

@usc-m
Copy link
Contributor

usc-m commented Oct 29, 2021

If domain checking needs done that needs hooked up to the CLI as well (the CLI parameter exists but is unused).

@ka-bu
Copy link
Contributor Author

ka-bu commented Oct 29, 2021

@aeshky please don't push commits to this branch without asking @usc-m , who has started writing tests already (besides I think there might have been a better place for the deleting those fixtures) ... @usc-m will be back on Tuesday morning and have no meetings so I can review right away :)

@ka-bu ka-bu marked this pull request as ready for review November 2, 2021 08:18
@ka-bu ka-bu requested a review from a team as a code owner November 2, 2021 08:18
@ka-bu ka-bu removed the request for review from a team November 2, 2021 08:18
def test_all_operators_in_schema():
operators_in_schema = rasa.shared.utils.schemas.markers.OPERATOR_SCHEMA["enum"]
operators_in_schema = {tag.lower() for tag in operators_in_schema}
def collect_slots(marker: Marker) -> Set[Text]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can merge these 3 functions into one :)

markers = Marker.from_path(config)

if not markers.validate_against_domain(domain):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice idea to just "bubble up" the warnings and then decide to break it here to gather all warnings first

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to do it more like a compiler where you can track line number and also collect warnings a bit more elegantly, but it seemed like adding that would involve a lot of complicated stuff around adding information about where markers were parsed from to the markers and I didn't want to do too much in the time we have left

Comment on lines 160 to 163
if not valid:
rasa.shared.utils.io.raise_warning(
f"Referenced action '{self.text}' does not exist in the domain"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@usc-m and @ka-bu I was creating the QA tests and was curious why this (and the same for intent and slot) is a warning and not an error. Does this mean that the extraction process will continue if there is a mismatch with the domain? What's the advantage of allowing this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extraction process isn't managed here - this goes over each marker and decides if they are valid, and if not emits a message, and then passes back True/False if it's valid or not. In the CLI we catch that boolean and error and exit based on it, but in these I wanted a way to collect all of the issues without stopping first, and then stop later (since reporting one at a time would be frustrating for users). The reason it's a warning is mostly because that was my first pass on it. Ideally it would go out as an error without terminating (and then get caught in the CLI to wrap it up and actually terminate) but we don't have a helper function for that so it's a little more involved - definitely something I want to fix before this merges.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI also terminates before the extraction process begins. It loads the marker definitions and the domain, does the validation first, and terminates if issues are found. It will only establish a connection to the tracker store (and by extension perform extraction) if there are no issues

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or in short: #10012 (comment) :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked at how DialogueStateTracker manages this and have replicated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that it's consistent but that approach is also a bit confusing, isn't it? I would've expected a "raise..." if there's a logger error (and usually, I would not explicitly log the error but just have a decorator to pass on the error message)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... on the other hand, getting an error as a result of "only warnings" is also a bit unexpected maybe 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I figured it was consistent from the perspective of the user, but it might be worth re-evaluating how we do error management across the codebase at some point

@ka-bu ka-bu requested a review from usc-m November 2, 2021 15:24
@ka-bu ka-bu merged commit 9142bd4 into main Nov 2, 2021
@ka-bu ka-bu deleted the markers/validate branch November 2, 2021 15:32
@ka-bu ka-bu mentioned this pull request Nov 2, 2021
3 tasks
indam23 pushed a commit that referenced this pull request Jul 27, 2022
* validation; name vs tag; specify expected number of sub-markers;...

* Delete old Marker test fixtures and associated files.

* Fix tests after rebasing main

* First pass on domain validation

* Add some tests around domain validation

Need to work out why set union over a generator expression wasn't working

* Remove unused markers test data

* Config validation tests

* Fix lint

* add tests for from_path; simplify other tests

* make domain path optional for _run_markers

* Codeclimate fixes

* Codeclimate fixes

* Swap over warnings for errors

Co-authored-by: aeshky <aciel.eshky@gmail.com>
Co-authored-by: Matthew Summers <m.summers@rasa.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Success Markers: validate config
3 participants