Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression Test Failure Comment #10682

Merged
merged 9 commits into from
Jan 14, 2022
Merged

Conversation

markus-hinsche
Copy link
Contributor

@markus-hinsche markus-hinsche commented Jan 13, 2022

Motivation: Since #10574, summary comments after a MR test run are gone, if a job is failing

Proposed changes:

  • Add comment also in case of (partial) failure (also send email in the same cases)
  • GH labels (status:model-regression-tests and runner: gpu) are now removed in both cases: success and failure
  • result.json artifact contains (only) results from those jobs that succeeded
  • Implementation: Introduce set_job_success_status job to set the overall run status to failed if one of the jobs failed
  • keep on-schedule as is (doesn't have comments)

Checked:

  • Case: only fail
  • Case: all succeed
  • Case: some fail
  • works for CPU and GPU

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

Success status of the run:

Commit: 6efb166, The full report is available as an artifact.

@github-actions
Copy link
Contributor

Success status of the run: Failed

Commit: 905d413, The full report is available as an artifact.

@github-actions
Copy link
Contributor

Status of the run: Failed

Commit: e03333a, The full report is available as an artifact.

@github-actions
Copy link
Contributor

Status of the run: Succeeded

Commit: e03333a, The full report is available as an artifact.

Dataset: financial-demo, Dataset repository branch: fix-model-regression-tests (external repository), commit: 52a3ad3eb5292d56542687e23b06703431f15ead
Configuration repository branch: main

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m26s, train: 2m51s, total: 4m17s
1.0000 (0.00) 0.8800 (0.00) no data

@github-actions
Copy link
Contributor

Status of the run: Failed

Commit: b389ec1, The full report is available as an artifact.

@github-actions
Copy link
Contributor

Status of the run: Failed

Commit: b389ec1, The full report is available as an artifact.

Dataset: financial-demo, Dataset repository branch: fix-model-regression-tests (external repository), commit: 52a3ad3eb5292d56542687e23b06703431f15ead
Configuration repository branch: main

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m13s, train: 2m57s, total: 4m9s
1.0000 (0.00) 0.8800 (0.00) no data

@github-actions
Copy link
Contributor

Status of the run: Failed

Commit: 08eb052, The full report is available as an artifact.

Dataset: financial-demo, Dataset repository branch: fix-model-regression-tests (external repository), commit: 52a3ad3eb5292d56542687e23b06703431f15ead
Configuration repository branch: main

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m39s, train: 4m22s, total: 6m1s
1.0000 (0.00) 0.8800 (0.00) no data

@github-actions
Copy link
Contributor

Status of the run: Failed

Commit: 7a1c76b, The full report is available as an artifact.

@github-actions
Copy link
Contributor

Hey @markus-hinsche! 👋 To run model regression tests, comment with the /modeltest command and a configuration.

Tips 💡: The model regression test will be run on push events. You can re-run the tests by re-add status:model-regression-tests label or use a Re-run jobs button in Github Actions workflow.

Tips 💡: Every time when you want to change a configuration you should edit the comment with the previous configuration.

You can copy this in your comment and customize:

/modeltest

```yml
##########
## Available datasets
##########
# - "Carbon Bot" (NLU)
# - "Hermit" (NLU)
# - "Private 1" (NLU)
# - "Private 2" (NLU)
# - "Private 3" (NLU)
# - "Sara" (NLU, Core)
# - "financial-demo" (NLU, Core)
# - "helpdesk-assistant" (NLU, Core)
# - "insurance-demo" (NLU, Core)
# - "retail-demo" (NLU, Core)

##########
## Available NLU configurations
##########
# - "BERT + DIET(bow) + ResponseSelector(bow)"
# - "BERT + DIET(seq) + ResponseSelector(t2t)"
# - "Spacy + DIET(bow) + ResponseSelector(bow)"
# - "Spacy + DIET(seq) + ResponseSelector(t2t)"
# - "Sparse + BERT + DIET(bow) + ResponseSelector(bow)"
# - "Sparse + BERT + DIET(seq) + ResponseSelector(t2t)"
# - "Sparse + DIET(bow) + ResponseSelector(bow)"
# - "Sparse + DIET(seq) + ResponseSelector(t2t)"
# - "Sparse + Spacy + DIET(bow) + ResponseSelector(bow)"
# - "Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)"

##########
## Available Core configurations
##########
# - "Rules"
# - "Rules + AugMemo"
# - "Rules + AugMemo + TED"
# - "Rules + Memo"
# - "Rules + Memo + TED"
# - "Rules + TED"

## Example configuration
#################### syntax #################
## include:
##   - dataset: ["<dataset_name>"]
##     config: ["<configuration_name>"]
#
## Example:
## include:
##  - dataset: ["Carbon Bot"]
##    config: ["Sparse + DIET(bow) + ResponseSelector(bow)"]
#
## Shortcut:
## You can use the "all" shortcut to include all available configurations or datasets
#
## Example: Use the "Sparse + EmbeddingIntent + ResponseSelector(bow)" configuration
## for all available datasets
## include:
##  - dataset: ["all"]
##    config: ["Sparse + DIET(bow) + ResponseSelector(bow)"]
#
## Example: Use all available configurations for the "Carbon Bot" and "Sara" datasets
## and for the "Hermit" dataset use the "Sparse + DIET + ResponseSelector(T2T)" and
## "BERT + DIET + ResponseSelector(T2T)" configurations:
## include:
##  - dataset: ["Carbon Bot", "Sara"]
##    config: ["all"]
##  - dataset: ["Hermit"]
##    config: ["Sparse + DIET(seq) + ResponseSelector(t2t)", "BERT + DIET(seq) + ResponseSelector(t2t)"]
#
## Example: Define a branch name to check-out for a dataset repository. Default branch is 'main'
## dataset_branch: "test-branch"
## include:
##  - dataset: ["Carbon Bot", "Sara"]
##    config: ["all"]
##
## Shortcuts:
## You can use the "all" shortcut to include all available configurations or datasets.
## You can use the "all-nlu" shortcut to include all available NLU configurations or datasets.
## You can use the "all-core" shortcut to include all available core configurations or datasets.

include:
 - dataset: ["Carbon Bot"]
   config: ["Sparse + DIET(bow) + ResponseSelector(bow)"]

```

@github-actions
Copy link
Contributor

/modeltest

include:
 - dataset: ["financial-demo", "helpdesk-assistant"]
   config: ["Sparse + BERT + DIET(seq) + ResponseSelector(t2t)"]

@github-actions
Copy link
Contributor

The model regression tests have started. It might take a while, please be patient.
As soon as results are ready you'll see a new comment with the results.

Used configuration can be found in the comment.

@github-actions
Copy link
Contributor

Status of the run: Failed

Commit: 7a1c76b, The full report is available as an artifact.

Dataset: financial-demo, Dataset repository branch: fix-model-regression-tests (external repository), commit: 52a3ad3eb5292d56542687e23b06703431f15ead
Configuration repository branch: main

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m3s, train: 3m25s, total: 4m27s
1.0000 (0.00) 0.8800 (0.00) no data

@markus-hinsche markus-hinsche marked this pull request as ready for review January 13, 2022 14:23
@markus-hinsche markus-hinsche requested a review from a team as a code owner January 13, 2022 14:23
@markus-hinsche markus-hinsche requested review from tczekajlo, dakshvar22 and ka-bu and removed request for a team January 13, 2022 14:23
Copy link

@tczekajlo tczekajlo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🍏

@@ -675,9 +675,25 @@ jobs:
needs:
- model_regression_test_cpu
- model_regression_test_gpu
if: always() && (needs.model_regression_test_cpu.result == 'success' || needs.model_regression_test_gpu.result == 'success')
if: ((needs.model_regression_test_cpu.result != 'skipped') != (needs.model_regression_test_gpu.result != 'skipped')) && always()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the "!=" instead of "||" here? (Guess doesn't make a difference anyway because it's either cpu or gpu who are executed and so both will never be true at the same time)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|| gives OR behavior, while != gives an XOR behavior.

Explanation:
A run starts just to read the comments, but doesn't do anything else, e.g., https://github.com/RasaHQ/rasa/runs/4802907417?check_suite_focus=true.
If we write ||, the combine_reports would start (I ran into this issue in an earlier commit in the PR at hand). This is also connected to how the always() expression works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok 👍 (what I had meant is that xor and or would effectively be the same because it seemed that both would not be non-skipped at the same time)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markus-hinsche can we merge this then? :)

@dakshvar22 dakshvar22 removed their request for review January 14, 2022 11:14
@markus-hinsche markus-hinsche merged commit 54823b6 into main Jan 14, 2022
@markus-hinsche markus-hinsche deleted the regr-test-failure-comment branch January 14, 2022 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants