Skip to content
This repository was archived by the owner on Jan 15, 2025. It is now read-only.

Comments

resolve cross train issues#814

Merged
feich-ms merged 15 commits intomasterfrom
feich/fixCrossTrainBugs
Jun 2, 2020
Merged

resolve cross train issues#814
feich-ms merged 15 commits intomasterfrom
feich/fixCrossTrainBugs

Conversation

@feich-ms
Copy link
Contributor

@feich-ms feich-ms commented May 21, 2020

Resolve cross train related issues in #800

  • Cross train

    • --config is relative to –in and not relative to the pwd()
      Resolved: fixed in this PR for both luis and qnamaker cross train

    • Cross-train config is file name case sensitive.
      Resolved: fixed in this PR and add tests to cover it. All file ids in config and luObject array will be lower case

    • Cross-train should only copy over files specified in the cross-train config to output but all references should be fully resolved before they are copied over.
      Resolved: fixed in this PR and adjust tests to cover it. Only lu files and corresponding qna files in config will be written out to --out folder

    • QnA meta-data property is not applied to references
      Resolved: fixed in this PR. All references from import will be resolved in current content and meta-data will be added too

    • Apply cross-train for .lu and .qna documents after fully resolving imports.
      Resolved: fixed in this PR. All imports are fully resolved

    • If multiple source .lu or .qna files are pulling in the same reference, the for QnA, meta-data pairs need to be added to the same QnA pair (e.g. root as well as dialog A pulling in some chitchat utterances)
      Resolved: fixed automatically after imports are resolved. Please notice that QnAMaker don't allow meta-date to be same key with multiple values in single KB pair, so there will be two KB pairs if there are same key with two different values in meta-data

    • Cross-train does not work when the child dialog does not have any LU content. AllowInterruption=true on that dialog does not seem to work.
      Resolved: fixed in this PR to support cross train empty files which means empty lu or qna files can have cross-trained results even if the file is empty.

All above issues are covered by unit tests.

@feich-ms feich-ms marked this pull request as ready for review May 27, 2020 08:04
@feich-ms feich-ms requested a review from munozemilio as a code owner May 27, 2020 08:04
@feich-ms feich-ms requested review from boydc2014 and vishwacsena May 27, 2020 08:05
@vishwacsena
Copy link
Contributor

@feich-ms I tried this. I'd like to use this PR to iterate on refinements. Can you try this sample (follow readme to do cross-train as well as luis:build) and figure out why the output luis appIds are not printed on screen (without --out specified)?

I also see The model name { personName } are reserved. without any description of what is causing it. We probably need to have these errors correlatable back to the specific .lu file that is causing it. Can you help figure out what's causing it? Depending on what's causing it, we probably need to investigate adding additional validations to BF-LU so we do not waste posting stuff up to LUIS and catch these upfront.

Copy link
Contributor

@vishwacsena vishwacsena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫

@feich-ms
Copy link
Contributor Author

feich-ms commented May 28, 2020

@vishwacsena, figured out the cause and pushed a remove patterns with prebuilt entity from cross-trained _Interruption intent(32c39a8) here.

  1. The luis appIds missing on screen are caused by publishing failure of some files (here is The model name { personName } are reserved error) as the appIds are not created successfully due to the failure.
  2. For The model name { personName } are reserved error, it is caused by patterns with prebuilt entity are not removed from _Interruption intent. I already fixed this issue before but missed some corner cases that prebuilt entity has role. The commit above can fix this.
  3. The model name {xxx} are reserved error can also happen in other intents besides interruption intent if xxx is prebuilt entity and not explicitly defined with @ prebuilt xxx by users. I think we can improve bf-lu to add validation for prebuilt entity in patterns to make sure they are defined explicitly, otherwise throw more friendly error messages. Here is the commit validate prebuilt entities in pattern having explicit definitions(a893de1) I pushed to add validation for prebuilt entities in patterns to make sure they are defined explicitly.

@vishwacsena
Copy link
Contributor

Thanks @feich-ms can you help try this sample E2E? I just pulled your latest and tried it and I see an empty QnA Maker KB although qnamaker:build says everything was published.

@feich-ms
Copy link
Contributor Author

@vishwacsena, just tried it and indeed there is no content in kb. I tried to reduce the QA pairs and questions in chitchat.qna, and then it works, so I'm assuming that QnA has number limits for questions of single KB pair as our deferToLuis pair will have father's QnA questions, here is chitchat.qna. I looked at the cross-trained qna file. There are more than thousands of questions in deferToLuis pair in two of the qna files, so maybe this is the root cause.

@vishwacsena
Copy link
Contributor

vishwacsena commented May 28, 2020

Aah. yes. could we add logic to break any of these into multiple questions with the same answer just so we are not past the qna limits? https://docs.microsoft.com/en-us/azure/cognitive-services/qnamaker/limits#knowledge-base-content-limits

I mean in cross-train we validate for qna pairs and split up if needed..

It is also odd that the service does not return an error or does it?

@feich-ms
Copy link
Contributor Author

Seems it didn't throw any errors, will double check. I will also break large size qna pair into smaller one today.

feich-ms added 2 commits May 29, 2020 14:15
…terances and optimize luConverter to make sure there is only one whitespace between words in utterances
…er in cross train and fix error not thrown issue in qnamaker
@feich-ms
Copy link
Contributor Author

feich-ms commented May 29, 2020

@vishwacsena, I added the logic to break the large QA pair into multiple smaller ones in cross-train. I figured out that the limit of question number for an answer in replace API seems to be 1000 instead of 300. I tested that manully, if question number is over 1000, like 1001, it will throw error like below. Number no more than 1000 will work well. Unit tests are also added for the split logic.

image

I also fixed the logic to throw out the errors from API. Now it can throw any API call failures to console like above screenshot.

@vishwacsena
Copy link
Contributor

@feich-ms can you explain why we need to lowercase the file names? This is hard to explain to the user because the recognizer configuration (casing) does not match because we are lower casing the file names.

@vishwacsena
Copy link
Contributor

Rest looks good.

@feich-ms
Copy link
Contributor Author

@vishwacsena, the lower case is introduced when resolving issue Cross-train config is file name case sensitive. I will optimize the logic to avoid file name lower casing.

@feich-ms
Copy link
Contributor Author

feich-ms commented Jun 1, 2020

@vishwacsena, the file name casing issue is resolved by the latest commit in cross-train. Now it will write out the cross-trained content with the original file names(no lower casing). The PR is still resolving the issue Cross-train config is file name case sensitive. So if your file names in cross train config are only different with real file names in casing, the cross-train CLI still works. Please let me know if you have more issues.

Copy link
Contributor

@vishwacsena vishwacsena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified changes functionally.

@vishwacsena
Copy link
Contributor

Thanks @feich-ms. All looks good. Approved.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants