Regression Test Failure Comment #10682

markus-hinsche · 2022-01-13T10:15:55Z

Motivation: Since #10574, summary comments after a MR test run are gone, if a job is failing

Proposed changes:

Add comment also in case of (partial) failure (also send email in the same cases)
GH labels (status:model-regression-tests and runner: gpu) are now removed in both cases: success and failure
result.json artifact contains (only) results from those jobs that succeeded
Implementation: Introduce set_job_success_status job to set the overall run status to failed if one of the jobs failed
keep on-schedule as is (doesn't have comments)

Checked:

Case: only fail
Case: all succeed
Case: some fail
works for CPU and GPU

Status (please check what you already did):

added some tests for the functionality
updated the documentation
updated the changelog (please check changelog for instructions)
reformat files using black (please check Readme for instructions)

…ng purposes)

github-actions · 2022-01-13T10:16:54Z

Commit: b9dcd29, The full report is available as an artifact.

github-actions · 2022-01-13T10:18:15Z

Commit: b9dcd29, The full report is available as an artifact.

github-actions · 2022-01-13T10:34:47Z

Commit: 59c2768, The full report is available as an artifact.

github-actions · 2022-01-13T10:36:43Z

Commit: 59c2768, The full report is available as an artifact.

github-actions · 2022-01-13T11:11:20Z

Success status of the run:

Commit: 6efb166, The full report is available as an artifact.

github-actions · 2022-01-13T11:15:57Z

Success status of the run: Failed

Commit: 905d413, The full report is available as an artifact.

github-actions · 2022-01-13T11:21:17Z

Status of the run: Failed

Commit: e03333a, The full report is available as an artifact.

github-actions · 2022-01-13T11:33:09Z

Status of the run: Succeeded

Commit: e03333a, The full report is available as an artifact.

Dataset: financial-demo, Dataset repository branch: fix-model-regression-tests (external repository), commit: 52a3ad3eb5292d56542687e23b06703431f15ead
Configuration repository branch: main

Configuration	Intent Classification Micro F1	Entity Recognition Micro F1	Response Selection Micro F1
`Sparse + BERT + DIET(seq) + ResponseSelector(t2t)` test: `1m26s`, train: `2m51s`, total: `4m17s`	1.0000 (0.00)	0.8800 (0.00)	`no data`

github-actions · 2022-01-13T11:56:11Z

Status of the run: Failed

Commit: b389ec1, The full report is available as an artifact.

…-operator dart-lang/sdk#530 (comment)

github-actions · 2022-01-13T12:04:44Z

Status of the run: Failed

Commit: b389ec1, The full report is available as an artifact.

Dataset: financial-demo, Dataset repository branch: fix-model-regression-tests (external repository), commit: 52a3ad3eb5292d56542687e23b06703431f15ead
Configuration repository branch: main

Configuration	Intent Classification Micro F1	Entity Recognition Micro F1	Response Selection Micro F1
`Sparse + BERT + DIET(seq) + ResponseSelector(t2t)` test: `1m13s`, train: `2m57s`, total: `4m9s`	1.0000 (0.00)	0.8800 (0.00)	`no data`

github-actions · 2022-01-13T12:46:01Z

Status of the run: Failed

Commit: 08eb052, The full report is available as an artifact.

Dataset: financial-demo, Dataset repository branch: fix-model-regression-tests (external repository), commit: 52a3ad3eb5292d56542687e23b06703431f15ead
Configuration repository branch: main

Configuration	Intent Classification Micro F1	Entity Recognition Micro F1	Response Selection Micro F1
`Sparse + BERT + DIET(seq) + ResponseSelector(t2t)` test: `1m39s`, train: `4m22s`, total: `6m1s`	1.0000 (0.00)	0.8800 (0.00)	`no data`

github-actions · 2022-01-13T14:04:40Z

Status of the run: Failed

Commit: 7a1c76b, The full report is available as an artifact.

github-actions · 2022-01-13T14:05:59Z

Hey @markus-hinsche! 👋 To run model regression tests, comment with the /modeltest command and a configuration.

Tips 💡: The model regression test will be run on push events. You can re-run the tests by re-add status:model-regression-tests label or use a Re-run jobs button in Github Actions workflow.

Tips 💡: Every time when you want to change a configuration you should edit the comment with the previous configuration.

You can copy this in your comment and customize:

/modeltest

```yml
##########
## Available datasets
##########
# - "Carbon Bot" (NLU)
# - "Hermit" (NLU)
# - "Private 1" (NLU)
# - "Private 2" (NLU)
# - "Private 3" (NLU)
# - "Sara" (NLU, Core)
# - "financial-demo" (NLU, Core)
# - "helpdesk-assistant" (NLU, Core)
# - "insurance-demo" (NLU, Core)
# - "retail-demo" (NLU, Core)

##########
## Available NLU configurations
##########
# - "BERT + DIET(bow) + ResponseSelector(bow)"
# - "BERT + DIET(seq) + ResponseSelector(t2t)"
# - "Spacy + DIET(bow) + ResponseSelector(bow)"
# - "Spacy + DIET(seq) + ResponseSelector(t2t)"
# - "Sparse + BERT + DIET(bow) + ResponseSelector(bow)"
# - "Sparse + BERT + DIET(seq) + ResponseSelector(t2t)"
# - "Sparse + DIET(bow) + ResponseSelector(bow)"
# - "Sparse + DIET(seq) + ResponseSelector(t2t)"
# - "Sparse + Spacy + DIET(bow) + ResponseSelector(bow)"
# - "Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)"

##########
## Available Core configurations
##########
# - "Rules"
# - "Rules + AugMemo"
# - "Rules + AugMemo + TED"
# - "Rules + Memo"
# - "Rules + Memo + TED"
# - "Rules + TED"

## Example configuration
#################### syntax #################
## include:
##   - dataset: ["<dataset_name>"]
##     config: ["<configuration_name>"]
#
## Example:
## include:
##  - dataset: ["Carbon Bot"]
##    config: ["Sparse + DIET(bow) + ResponseSelector(bow)"]
#
## Shortcut:
## You can use the "all" shortcut to include all available configurations or datasets
#
## Example: Use the "Sparse + EmbeddingIntent + ResponseSelector(bow)" configuration
## for all available datasets
## include:
##  - dataset: ["all"]
##    config: ["Sparse + DIET(bow) + ResponseSelector(bow)"]
#
## Example: Use all available configurations for the "Carbon Bot" and "Sara" datasets
## and for the "Hermit" dataset use the "Sparse + DIET + ResponseSelector(T2T)" and
## "BERT + DIET + ResponseSelector(T2T)" configurations:
## include:
##  - dataset: ["Carbon Bot", "Sara"]
##    config: ["all"]
##  - dataset: ["Hermit"]
##    config: ["Sparse + DIET(seq) + ResponseSelector(t2t)", "BERT + DIET(seq) + ResponseSelector(t2t)"]
#
## Example: Define a branch name to check-out for a dataset repository. Default branch is 'main'
## dataset_branch: "test-branch"
## include:
##  - dataset: ["Carbon Bot", "Sara"]
##    config: ["all"]
##
## Shortcuts:
## You can use the "all" shortcut to include all available configurations or datasets.
## You can use the "all-nlu" shortcut to include all available NLU configurations or datasets.
## You can use the "all-core" shortcut to include all available core configurations or datasets.

include:
 - dataset: ["Carbon Bot"]
   config: ["Sparse + DIET(bow) + ResponseSelector(bow)"]

```

github-actions · 2022-01-13T14:06:04Z

/modeltest

include:
 - dataset: ["financial-demo", "helpdesk-assistant"]
   config: ["Sparse + BERT + DIET(seq) + ResponseSelector(t2t)"]

github-actions · 2022-01-13T14:06:06Z

The model regression tests have started. It might take a while, please be patient.
As soon as results are ready you'll see a new comment with the results.

Used configuration can be found in the comment.

github-actions · 2022-01-13T14:18:55Z

Status of the run: Failed

Commit: 7a1c76b, The full report is available as an artifact.

Dataset: financial-demo, Dataset repository branch: fix-model-regression-tests (external repository), commit: 52a3ad3eb5292d56542687e23b06703431f15ead
Configuration repository branch: main

Configuration	Intent Classification Micro F1	Entity Recognition Micro F1	Response Selection Micro F1
`Sparse + BERT + DIET(seq) + ResponseSelector(t2t)` test: `1m3s`, train: `3m25s`, total: `4m27s`	1.0000 (0.00)	0.8800 (0.00)	`no data`

tczekajlo

LGTM 🍏

ka-bu · 2022-01-14T08:45:14Z

.github/workflows/ci-model-regression.yml

@@ -675,9 +675,25 @@ jobs:
    needs:
      - model_regression_test_cpu
      - model_regression_test_gpu
-    if: always() && (needs.model_regression_test_cpu.result == 'success' || needs.model_regression_test_gpu.result == 'success')
+    if: ((needs.model_regression_test_cpu.result != 'skipped') != (needs.model_regression_test_gpu.result != 'skipped')) && always()


why the "!=" instead of "||" here? (Guess doesn't make a difference anyway because it's either cpu or gpu who are executed and so both will never be true at the same time)

|| gives OR behavior, while != gives an XOR behavior.

Explanation:
A run starts just to read the comments, but doesn't do anything else, e.g., https://github.com/RasaHQ/rasa/runs/4802907417?check_suite_focus=true.
If we write ||, the combine_reports would start (I ran into this issue in an earlier commit in the PR at hand). This is also connected to how the always() expression works.

Ah ok 👍 (what I had meant is that xor and or would effectively be the same because it seemed that both would not be non-skipped at the same time)

@markus-hinsche can we merge this then? :)

Introduce set_job_success_status job, Fail the test runs (for debuggi…

f92f566

…ng purposes)

markus-hinsche added the status:model-regression-tests label Jan 13, 2022