-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement telemetry reporting (#6619)
* implemented telemetry reporting * fixed workflow secrets * addressed deepsource issues * ignore certain errors * fixed syntax error * integrated review suggestions * fixed tests * fixed async tests * Apply suggestions from code review Co-authored-by: Federico Tedin <federicotdn@users.noreply.github.com> * avoid blocking on telemetry calls * fixed reporting of training for nlu / core only * Apply suggestions from code review Co-authored-by: Akela Drissner-Schmid <32450038+akelad@users.noreply.github.com> * integrated review docs suggestions * added training completed event * fixed telemetry documentation * fixed some tests * trying to fix tests * trying to make tests work * fixed merge issues * fixed telemetry link * trying to fix async issue * removed unecessary import * fixed changelog link * adressed review comments * added tests for track * fixed some typing issues * fixed linter issues * improved imports * organize imports * fixed some more deepsource issues * Automated error reporting (#6656) * implemented error reporting * adapted ci structure to keys structure * implemented key dumping * added secret excplanation * fixed telemetry link * added instructions on where to find the sentry key * added creation of releases on sentry * Apply suggestions from code review Co-authored-by: Tobias Wochinger <t.wochinger@rasa.com> * addressed review comments Co-authored-by: Tobias Wochinger <t.wochinger@rasa.com> * fixed private var access * fixed wrong arg * fixed style errors * fixed some more style errors * fixed deepsource issues * Update channel.py Co-authored-by: Federico Tedin <federicotdn@users.noreply.github.com> Co-authored-by: Akela Drissner-Schmid <32450038+akelad@users.noreply.github.com> Co-authored-by: Tobias Wochinger <t.wochinger@rasa.com>
- Loading branch information
1 parent
4c8654c
commit 269bdaa
Showing
79 changed files
with
2,084 additions
and
396 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
Added telemetry reporting. Rasa uses telemetry to report anonymous usage information. | ||
This information is essential to help improve Rasa Open Source for all users. | ||
Reporting will be opt-out. More information can be found in our | ||
[telemetry documentation](./telemetry/telemetry.mdx). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,3 @@ | ||
# Local Netlify folder | ||
.netlify | ||
.netlify | ||
docs/telemetry/reference.mdx |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,141 @@ | ||
{ | ||
"sections": [ | ||
"Model Training", | ||
"Miscellaneous" | ||
], | ||
"defaultSection": "Miscellaneous", | ||
"events": { | ||
"Telemetry Disabled": { | ||
"description": "Triggered when telemetry reporting gets disabled. Last event sent before disabling telemetry. This event is not send, if the user never enabled telemetry reporting before deactivating it." | ||
}, | ||
"Training Started": { | ||
"description": "A training of a Rasa machine learning model got started. The event provides information on aggregated training data statistics.", | ||
"type": "object", | ||
"section": "Model Training", | ||
"properties": { | ||
"language": { | ||
"type": "string", | ||
"minLength": 1, | ||
"description": "Language model is trained with, e.g. 'en'." | ||
}, | ||
"training_id": { | ||
"type": "string", | ||
"minLength": 1, | ||
"description": "Generated unique identifier for this training." | ||
}, | ||
"model_type": { | ||
"type": "string", | ||
"description": "Type of model trained, either 'nlu', 'core' or 'rasa'." | ||
}, | ||
"pipeline": { | ||
"oneOf": [ | ||
{ "type": "string"}, | ||
{ "type": "array", "items": {"type": "object"}} | ||
], | ||
"description": "List of the pipeline configurations used for training." | ||
}, | ||
"policies": { | ||
"type": "array", | ||
"items": { | ||
"type": "object" | ||
}, | ||
"description": "List of the policy configurations used for training." | ||
}, | ||
"num_intent_examples": { | ||
"type": "integer", | ||
"description": "Number of NLU examples." | ||
}, | ||
"num_entity_examples": { | ||
"type": "integer", | ||
"description": "Number of entity examples." | ||
}, | ||
"num_actions": { | ||
"type": "integer", | ||
"description": "Number of actions defined in the domain." | ||
}, | ||
"num_templates": { | ||
"type": "integer", | ||
"description": "Number of templates defined in the domain." | ||
}, | ||
"num_slots": { | ||
"type": "integer", | ||
"description": "Number of slots defined in the domain." | ||
}, | ||
"num_forms": { | ||
"type": "integer", | ||
"description": "Number of forms defined in the domain." | ||
}, | ||
"num_intents": { | ||
"type": "integer", | ||
"description": "Number of intents defined in the domain." | ||
}, | ||
"num_entities": { | ||
"type": "integer", | ||
"description": "Number of entities defined in the domain." | ||
}, | ||
"num_story_steps": { | ||
"type": "integer", | ||
"description": "Number of story steps available." | ||
}, | ||
"num_lookup_tables": { | ||
"type": "integer", | ||
"description": "Number of different lookup tables." | ||
}, | ||
"num_synonyms": { | ||
"type": "integer", | ||
"description": "Total number of entity synonyms defined." | ||
}, | ||
"num_regexes": { | ||
"type": "integer", | ||
"description": "Total number of regexes defined." | ||
} | ||
}, | ||
"additionalProperties": false, | ||
"required": [ | ||
"language", | ||
"training_id", | ||
"model_type", | ||
"pipeline", | ||
"policies", | ||
"num_intent_examples", | ||
"num_entity_examples", | ||
"num_actions", | ||
"num_templates", | ||
"num_slots", | ||
"num_forms", | ||
"num_intents", | ||
"num_entities", | ||
"num_story_steps", | ||
"num_lookup_tables", | ||
"num_synonyms", | ||
"num_regexes" | ||
] | ||
}, | ||
"Training Completed": { | ||
"description": "The training of a Rasa machine learning model finished. The event provides information about the resulting model.", | ||
"type": "object", | ||
"section": "Model Training", | ||
"properties": { | ||
"training_id": { | ||
"type": "string", | ||
"minLength": 1, | ||
"description": "Generated unique identifier for this training. Can be used to join with 'Training Started'." | ||
}, | ||
"model_type": { | ||
"type": "string", | ||
"description": "Type of model trained, either 'nlu', 'core' or 'rasa'." | ||
}, | ||
"runtime": { | ||
"type": "integer", | ||
"description": "The time in seconds it took to train the model." | ||
} | ||
}, | ||
"additionalProperties": false, | ||
"required": [ | ||
"training_id", | ||
"model_type", | ||
"runtime" | ||
] | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,123 @@ | ||
--- | ||
id: telemetry | ||
sidebar_label: Rasa Telemetry | ||
title: Rasa Telemetry | ||
abstract: | | ||
Rasa uses telemetry to report anonymous usage information. This information | ||
is essential to help improve Rasa Open Source for all users. | ||
--- | ||
|
||
For the team working on Rasa Open Source it is important to understand | ||
how the product is used. It allows us to properly prioritize our research | ||
efforts and feature development. | ||
|
||
You will be notified about the telemetry reporting when running Rasa Open Source | ||
for the first time. | ||
|
||
## How to opt-out | ||
|
||
You can opt out of telemetry reporting at any time by running the command: | ||
```bash | ||
rasa telemetry disable | ||
``` | ||
|
||
or by defining `RASA_TELEMETRY_ENABLED=false` as an environment variable. | ||
If you want to enable reporting again, you can run: | ||
```bash | ||
rasa telemetry enable | ||
``` | ||
|
||
## Why do we use telemetry reporting? | ||
|
||
**Anonymous** telemetry data allow us to prioritize our research efforts | ||
and feature development based on usage. We want to collect aggregated | ||
information on usage and reliability so that we can ensure a high-quality product. | ||
|
||
So how will we use the reported telemetry data? Here are some examples | ||
of what we use the data for: | ||
|
||
- We will be able to know which languages, pipelines and policies are used. | ||
This will enable us to direct our research efforts towards text and | ||
dialogue handling projects that will have the biggest impact for our users. | ||
- We will be able to know data set sizes and general structure (e.g. the number | ||
of intents). This allows us to better test our software on different types | ||
of data sets and optimize the frameworks performance. | ||
- We will be able to get more detail on the types of errors you are running | ||
into while building an assistant (e.g. initialization, training, etc.). | ||
This will let us improve the quality of our framework and better focus our | ||
time on solving more common, frustrating issues. | ||
|
||
## What about sensitive data? | ||
|
||
Your sensitive data never leaves your machine. We: | ||
- **don't** report any personal identifiable information | ||
- **don't** report your training data | ||
- **don't** report any messages your assistant receives or sends | ||
|
||
:::note Inspect what is reported | ||
You can view all the telemetry information that is reported | ||
by defining the environment variable `RASA_TELEMETRY_DEBUG=true`, for example when running the train command: | ||
```bash | ||
RASA_TELEMETRY_DEBUG=true rasa train | ||
``` | ||
When you set `RASA_TELEMETRY_DEBUG` no information will be sent to any server, | ||
instead it will be logged to the commandline as a json dump for you to inspect. | ||
::: | ||
|
||
## What do we report? | ||
|
||
Rasa reports aggregated usage details, command invocations, performance | ||
measurements and errors. | ||
We use the telemetry data to better understand usage patterns. The reported data | ||
will directly allow us to better decide how to design future features | ||
and prioritize current work. | ||
|
||
Specifically, we collect the following information for all telemetry events: | ||
|
||
- Type of the reported event (e.g. *Training Started*) | ||
- Rasa machine ID: This is generated with a UUID and stored in the global Rasa | ||
config at `~/.config/rasa/global.yml` and sent as `metrics_id` | ||
- One-way hash of the current working directory or a hash of the git remote | ||
- General OS level information (operating system, number of CPUs, number of | ||
GPUs and whether the command is run inside a CI) | ||
- Current Rasa Open Source and Python version | ||
|
||
Here is an example report that shows the data reported to Rasa after running | ||
`rasa train`: | ||
```json | ||
{ | ||
"userId": "38d23c36c9be443281196080fcdd707d", | ||
"event": "Training Started", | ||
"properties": { | ||
"language": "en", | ||
"num_intent_examples": 68, | ||
"num_entity_examples": 0, | ||
"num_actions": 17, | ||
"num_templates": 6, | ||
"num_slots": 0, | ||
"num_forms": 0, | ||
"num_intents": 6, | ||
"num_entities": 0, | ||
"num_story_steps": 5, | ||
"num_lookup_tables": 0, | ||
"num_synonyms": 0, | ||
"num_regexes": 0, | ||
"metrics_id": "38d23c36c9be443281196080fcdd707d" | ||
}, | ||
"context": { | ||
"os": { | ||
"name": "Darwin", | ||
"version": "19.4.0" | ||
}, | ||
"ci": false, | ||
"project": "a0a7178e6e5f9e6484c5cfa3ea4497ffc0c96d0ad3f3ad8e9399a1edd88e3cf4", | ||
"python": "3.7.5", | ||
"rasa_open_source": "2.0.0", | ||
"gpu": 0, | ||
"cpu": 16 | ||
} | ||
} | ||
``` | ||
|
||
We **cannot identify individual users** from the dataset. It is anonymized and | ||
untraceable back to the user. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.