Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to use or with slot values #9317

Merged
merged 25 commits into from
Sep 17, 2021
Merged

Make it possible to use or with slot values #9317

merged 25 commits into from
Sep 17, 2021

Conversation

alwx
Copy link
Contributor

@alwx alwx commented Aug 10, 2021

Proposed changes:

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

@alwx alwx changed the title Or slot was set Make it possible to use or with slot values Aug 10, 2021
@alwx alwx requested a review from joejuzl August 10, 2021 12:19
@alwx alwx marked this pull request as ready for review August 10, 2021 12:19
@alwx alwx requested a review from a team as a code owner August 10, 2021 12:19
Copy link
Member

@ancalita ancalita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 💯 I've only left a few nit-picking comments, nothing too major.
I was also wondering if the task regarding rasa.shared.core.training_data.visualization modification in the Definition of Done was set aside? or something you have to follow-up on?

@alwx alwx removed the request for review from joejuzl August 12, 2021 08:07
Copy link
Contributor

@joejuzl joejuzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!
I think it would be really cool if we could test this through the TrainingDataGenerator.
So we can see that the actual trackers are created as we expect (test_common_story_reader.py is probably the most similar).

Also we need to make sure that story visualisation still works correctly (it's used by Rasa X).

@alwx alwx requested a review from ancalita August 17, 2021 08:48
Copy link
Member

@ancalita ancalita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good 👍 I think the visualisation module also needs updating.

},
)
assert tracker.events[2] == ActionExecuted("utter_default")
assert tracker.events[3] == SlotSet(key="name", value="joe")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a missing newline at the end here.
Also, would index 4 in tracker.events also be a SlotSet event, the one with value = bob?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good assumption but no — however, I am not sure why. My assumption was that there should be two trackers and trackers[0].events[3] == SlotSet(key="name", value="joe") and trackers[1].events[3] == SlotSet(key="name", value="bob") but that doesn't happen.
I was thinking that's an issue but it doesn't happen with or expressions when the intents (not SlotSets) are used, and it was like that before all these changes — so I now wonder if it's the bug or I just got that all wrong.
Maybe somebody can clarify? @joejuzl or @wochinge or maybe even someone from research

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good catch @ancalita ! It should be two trackers - one with "joe" and one with "bob". If you do the or with intents you also get two trackers. This might require some changes in the TrackerGenerator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be handled right @alwx ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good to go now — I rebased the PR: there is no common_story_reader.py anymore, and the test to check if story steps get copied or not has already been added to test_yaml_story_reader.py (e.g. test_or_statement_with_slot_was_set).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the rebasing just deleted this test. We still need a test which asserts that the generator creates two trackers out of the story with the or slot. So you basically need a test which asserts that training.load_data causes 2 trackers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally added a test for this! (test_or_statement_story_with_or_slot_was_set)

@alwx alwx mentioned this pull request Aug 18, 2021
16 tasks
@alwx alwx requested review from joejuzl and ancalita August 18, 2021 15:43
@alwx
Copy link
Contributor Author

alwx commented Aug 18, 2021

Just checked it: slot_was_set doesn't affect the visualization at all — so nothing needs to be adapted in rasa.shared.core.training_data.visualization

},
)
assert tracker.events[2] == ActionExecuted("utter_default")
assert tracker.events[3] == SlotSet(key="name", value="joe")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be handled right @alwx ?

@alwx alwx requested a review from joejuzl September 7, 2021 12:18
@alwx alwx requested a review from wochinge September 10, 2021 10:40
@@ -398,8 +398,8 @@ def test_db_url_with_query_from_endpoint_config(tmp_path: Path):
driver: my-driver
another: query
"""
f = tmp_path / "tmp_config_file.yml"
f.write_text(endpoint_config)
stories_path = tmpdir / "stories.yml"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why these modifications were needed? In addition to the code quality failings, the test also fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it was some post-merge stuff, sorry 🙈
Removed it!

steps[0].as_story_string()
== """
## hello world
* slot{"slot_was_set": [{"name": "joe"}]} OR slot{"slot_was_set": [{"name": "bob"}]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that actually the CI expects this:
slot{"name": "joe"} OR slot{"name": "bob"} 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this

@alwx alwx requested a review from ancalita September 14, 2021 07:28
- action: some_action
"""

reader = YAMLStoryReader(is_used_for_training=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a test here when the is_used_for_training flag is True too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tested in another test (test_or_statement_with_slot_was_set_is_used_for_training) — in this case it makes no sense 'cause it will break the logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by it will break the logic? You mean the method as_story_string() can't be called for each of the 2 steps?

Copy link
Contributor Author

@alwx alwx Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because when you specify that is_used_for_training=True then checkpoints are getting added and trackers are getting copied (see add_stories method in StoryStepBuilder, line 110). As a result, reader.read_from_parsed_yaml(yaml_content) will create multiple trackers and calling .as_story_string on them will lead to results like that which are meaningless:

## hello world
    - slot{"name": "joe"}
> GENR_OR_b2ae7

However, this result is important when these trackers are used for training, and that case is already tested in test_or_statement_with_slot_was_set_is_used_for_training

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for the deep-dive!

@alwx alwx requested a review from ancalita September 14, 2021 08:32
Copy link
Member

@ancalita ancalita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🙌🏼


Args:
events: Events that need to be added.
is_used_for_training: Identifies if it's a part of OR statement.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this anymore if we no longer support MD?

def _generate_checkpoint_name_for_or_statement(
self, messages: List[UserUttered]
) -> str:
def _generate_checkpoint_name_for_or_statement(self, messages: List[Event]) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method refers to messages_texts_or_intents - does this work for all events?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The __str__ method is definied for all the events so it does because it requires only that. However, the names should be changed.

@alwx alwx requested a review from joejuzl September 14, 2021 16:21
),
],
)
async def test_story_with_retrieval_intent_warns(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not expect this warning any more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returned it back — in this case we can actually just remove the optional argument for YAMLStoryReader to test exactly the same warning.

assert len(steps) == 1

assert (
steps[0].as_story_string()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this no longer used?

Copy link
Contributor Author

@alwx alwx Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't. is_used_for_training=False can no longer be used so the tracker(s) generated by StoryReader can only be used for training.
It makes sense because previously the ability to generate trackers not for training purposes was linked to conversion between Markdown and Yaml.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So can we delete as_story_string() as part of removing mark down support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still seems to be used by the interactive learning so I would skip that for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should we keep the test then?

steps[0].as_story_string()
== """
## hello world
* slot{"name": "joe"} OR slot{"name": "bob"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@alwx alwx requested a review from joejuzl September 15, 2021 09:16
Copy link
Contributor

@joejuzl joejuzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make it possible to OR slot values - Implementation
4 participants