resolve cross train issues by feich-ms · Pull Request #814 · microsoft/botframework-cli

feich-ms · 2020-05-21T09:16:43Z

Resolve cross train related issues in #800

Cross train
- --config is relative to –in and not relative to the pwd()
  Resolved: fixed in this PR for both luis and qnamaker cross train
- Cross-train config is file name case sensitive.
  Resolved: fixed in this PR and add tests to cover it. All file ids in config and luObject array will be lower case
- Cross-train should only copy over files specified in the cross-train config to output but all references should be fully resolved before they are copied over.
  Resolved: fixed in this PR and adjust tests to cover it. Only lu files and corresponding qna files in config will be written out to --out folder
- QnA meta-data property is not applied to references
  Resolved: fixed in this PR. All references from import will be resolved in current content and meta-data will be added too
- Apply cross-train for .lu and .qna documents after fully resolving imports.
  Resolved: fixed in this PR. All imports are fully resolved
- If multiple source .lu or .qna files are pulling in the same reference, the for QnA, meta-data pairs need to be added to the same QnA pair (e.g. root as well as dialog A pulling in some chitchat utterances)
  Resolved: fixed automatically after imports are resolved. Please notice that QnAMaker don't allow meta-date to be same key with multiple values in single KB pair, so there will be two KB pairs if there are same key with two different values in meta-data
- Cross-train does not work when the child dialog does not have any LU content. AllowInterruption=true on that dialog does not seem to work.
  Resolved: fixed in this PR to support cross train empty files which means empty lu or qna files can have cross-trained results even if the file is empty.

All above issues are covered by unit tests.

… tests

vishwacsena · 2020-05-27T19:03:47Z

@feich-ms I tried this. I'd like to use this PR to iterate on refinements. Can you try this sample (follow readme to do cross-train as well as luis:build) and figure out why the output luis appIds are not printed on screen (without --out specified)?

I also see The model name { personName } are reserved. without any description of what is causing it. We probably need to have these errors correlatable back to the specific .lu file that is causing it. Can you help figure out what's causing it? Depending on what's causing it, we probably need to investigate adding additional validations to BF-LU so we do not waste posting stuff up to LUIS and catch these upfront.

vishwacsena

🚫

… intent

feich-ms · 2020-05-28T03:25:09Z

@vishwacsena, figured out the cause and pushed a remove patterns with prebuilt entity from cross-trained _Interruption intent(32c39a8) here.

The luis appIds missing on screen are caused by publishing failure of some files (here is The model name { personName } are reserved error) as the appIds are not created successfully due to the failure.
For The model name { personName } are reserved error, it is caused by patterns with prebuilt entity are not removed from _Interruption intent. I already fixed this issue before but missed some corner cases that prebuilt entity has role. The commit above can fix this.
The model name {xxx} are reserved error can also happen in other intents besides interruption intent if xxx is prebuilt entity and not explicitly defined with @ prebuilt xxx by users. I think we can improve bf-lu to add validation for prebuilt entity in patterns to make sure they are defined explicitly, otherwise throw more friendly error messages. Here is the commit validate prebuilt entities in pattern having explicit definitions(a893de1) I pushed to add validation for prebuilt entities in patterns to make sure they are defined explicitly.

vishwacsena · 2020-05-28T18:09:21Z

Thanks @feich-ms can you help try this sample E2E? I just pulled your latest and tried it and I see an empty QnA Maker KB although qnamaker:build says everything was published.

feich-ms · 2020-05-28T23:50:44Z

@vishwacsena, just tried it and indeed there is no content in kb. I tried to reduce the QA pairs and questions in chitchat.qna, and then it works, so I'm assuming that QnA has number limits for questions of single KB pair as our deferToLuis pair will have father's QnA questions, here is chitchat.qna. I looked at the cross-trained qna file. There are more than thousands of questions in deferToLuis pair in two of the qna files, so maybe this is the root cause.

vishwacsena · 2020-05-28T23:55:23Z

Aah. yes. could we add logic to break any of these into multiple questions with the same answer just so we are not past the qna limits? https://docs.microsoft.com/en-us/azure/cognitive-services/qnamaker/limits#knowledge-base-content-limits

I mean in cross-train we validate for qna pairs and split up if needed..

It is also odd that the service does not return an error or does it?

feich-ms · 2020-05-29T00:03:48Z

Seems it didn't throw any errors, will double check. I will also break large size qna pair into smaller one today.

…terances and optimize luConverter to make sure there is only one whitespace between words in utterances

…er in cross train and fix error not thrown issue in qnamaker

feich-ms · 2020-05-29T10:33:36Z

@vishwacsena, I added the logic to break the large QA pair into multiple smaller ones in cross-train. I figured out that the limit of question number for an answer in replace API seems to be 1000 instead of 300. I tested that manully, if question number is over 1000, like 1001, it will throw error like below. Number no more than 1000 will work well. Unit tests are also added for the split logic.

I also fixed the logic to throw out the errors from API. Now it can throw any API call failures to console like above screenshot.

vishwacsena · 2020-05-29T19:08:11Z

@feich-ms can you explain why we need to lowercase the file names? This is hard to explain to the user because the recognizer configuration (casing) does not match because we are lower casing the file names.

vishwacsena · 2020-05-29T19:08:20Z

Rest looks good.

feich-ms · 2020-05-30T02:09:37Z

@vishwacsena, the lower case is introduced when resolving issue Cross-train config is file name case sensitive. I will optimize the logic to avoid file name lower casing.

feich-ms · 2020-06-01T09:19:32Z

@vishwacsena, the file name casing issue is resolved by the latest commit in cross-train. Now it will write out the cross-trained content with the original file names(no lower casing). The PR is still resolving the issue Cross-train config is file name case sensitive. So if your file names in cross train config are only different with real file names in casing, the cross-train CLI still works. Please let me know if you have more issues.

vishwacsena

Verified changes functionally.

vishwacsena · 2020-06-01T20:32:23Z

Thanks @feich-ms. All looks good. Approved.

feich-ms added 6 commits May 21, 2020 17:14

make --config relative to cwd for luis and qnamaker cross train

92c9f71

fix typo

7d3b74d

resolve imports and only copy files sepcified in config

e9cedfb

support case insensitive config and cross train empty file and adjust…

6924b85

… tests

Merge branch 'master' into feich/fixCrossTrainBugs

065aa0c

adjust test cases for luis:cross-train and qnamaker:cross-train

160d83c

feich-ms marked this pull request as ready for review May 27, 2020 08:04

feich-ms requested a review from munozemilio as a code owner May 27, 2020 08:04

feich-ms requested review from boydc2014 and vishwacsena May 27, 2020 08:05

vishwacsena suggested changes May 27, 2020

View reviewed changes

feich-ms added 2 commits May 28, 2020 09:53

Merge branch 'master' into feich/fixCrossTrainBugs

c84f93f

remove patterns with prebuilt entity from cross-trained _Interruption…

32c39a8

… intent

feich-ms added 2 commits May 28, 2020 12:22

validate prebuilt entities in pattern having explicit definitions

a893de1

adjust test cases in luis package

f2960cd

feich-ms added 2 commits May 29, 2020 14:15

optimize crossTrainer dedup logic to make it compare lower case of ut…

14270d8

…terances and optimize luConverter to make sure there is only one whitespace between words in utterances

split large QA pair into smaller ones to overcome the limit of QnAMak…

19b82e2

…er in cross train and fix error not thrown issue in qnamaker

add unit tests for split large KB pair into smaller ones in cross train

b2db45a

resolve file name lower casing issue when write out

705b313

Merge branch 'master' into feich/fixCrossTrainBugs

3b7bca2

vishwacsena approved these changes Jun 1, 2020

View reviewed changes

feich-ms merged commit bac9de7 into master Jun 2, 2020

feich-ms deleted the feich/fixCrossTrainBugs branch June 2, 2020 02:05

feich-ms mentioned this pull request Jun 4, 2020

Adjust QnA as well as LUIS cross-train to work with QnA limits as well as LUIS limits #824

Closed

Comments

Conversation

feich-ms commented May 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vishwacsena commented May 27, 2020

Uh oh!

vishwacsena left a comment

Choose a reason for hiding this comment

Uh oh!

feich-ms commented May 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vishwacsena commented May 28, 2020

Uh oh!

feich-ms commented May 28, 2020

Uh oh!

vishwacsena commented May 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

feich-ms commented May 29, 2020

Uh oh!

feich-ms commented May 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vishwacsena commented May 29, 2020

Uh oh!

vishwacsena commented May 29, 2020

Uh oh!

feich-ms commented May 30, 2020

Uh oh!

feich-ms commented Jun 1, 2020

Uh oh!

vishwacsena left a comment

Choose a reason for hiding this comment

Uh oh!

vishwacsena commented Jun 1, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feich-ms commented May 21, 2020 •

edited

Loading

feich-ms commented May 28, 2020 •

edited

Loading

vishwacsena commented May 28, 2020 •

edited

Loading

feich-ms commented May 29, 2020 •

edited

Loading