Skip to content

Commit

Permalink
Implement telemetry reporting (#6619)
Browse files Browse the repository at this point in the history
* implemented telemetry reporting

* fixed workflow secrets

* addressed deepsource issues

* ignore certain errors

* fixed syntax error

* integrated review suggestions

* fixed tests

* fixed async tests

* Apply suggestions from code review

Co-authored-by: Federico Tedin <federicotdn@users.noreply.github.com>

* avoid blocking on telemetry calls

* fixed reporting of training for nlu / core only

* Apply suggestions from code review

Co-authored-by: Akela Drissner-Schmid <32450038+akelad@users.noreply.github.com>

* integrated review docs suggestions

* added training completed event

* fixed telemetry documentation

* fixed some tests

* trying to fix tests

* trying to make tests work

* fixed merge issues

* fixed telemetry link

* trying to fix async issue

* removed unecessary import

* fixed changelog link

* adressed review comments

* added tests for track

* fixed some typing issues

* fixed linter issues

* improved imports

* organize imports

* fixed some more deepsource issues

* Automated error reporting (#6656)

* implemented error reporting

* adapted ci structure to keys structure

* implemented key dumping

* added secret excplanation

* fixed telemetry link

* added instructions on where to find the sentry key

* added creation of releases on sentry

* Apply suggestions from code review

Co-authored-by: Tobias Wochinger <t.wochinger@rasa.com>

* addressed review comments

Co-authored-by: Tobias Wochinger <t.wochinger@rasa.com>

* fixed private var access

* fixed wrong arg

* fixed style errors

* fixed some more style errors

* fixed deepsource issues

* Update channel.py

Co-authored-by: Federico Tedin <federicotdn@users.noreply.github.com>
Co-authored-by: Akela Drissner-Schmid <32450038+akelad@users.noreply.github.com>
Co-authored-by: Tobias Wochinger <t.wochinger@rasa.com>
  • Loading branch information
4 people authored Sep 14, 2020
1 parent 4c8654c commit 269bdaa
Show file tree
Hide file tree
Showing 79 changed files with 2,084 additions and 396 deletions.
33 changes: 33 additions & 0 deletions .github/workflows/continous-integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,13 @@ on:
# RasaHQ/rasa on pypi (account credentials in 1password)
# - DOCKERHUB_PASSWORD: password for an account with write access to the rasa
# repo on hub.docker.com. used to pull and upload containers
# - RASA_OSS_TELEMETRY_WRITE_KEY: key to write to segment. Used to report telemetry.
# The key will be added to the distributions
# - RASA_OSS_EXCEPTION_WRITE_KEY: key to write to sentry. Used to report exceptions.
# The key will be added to the distributions.
# Key can be found at https://sentry.io/settings/rasahq/projects/rasa-open-source/install/python/
# - SENTRY_AUTH_TOKEN: authentication used to tell Sentry about any new releases
# created at https://sentry.io/settings/account/api/auth-tokens/

env:
# needed to fix issues with boto during testing:
Expand Down Expand Up @@ -210,6 +217,14 @@ jobs:
- name: Pull latest${{ matrix.image.tag_ext }} Docker image for caching
run: docker pull rasa/rasa:latest${{ matrix.image.tag_ext }} || true

- name: Copy Segment write key to the package
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags') && github.repository == 'RasaHQ/rasa'
env:
RASA_TELEMETRY_WRITE_KEY: ${{ secrets.RASA_OSS_TELEMETRY_WRITE_KEY }}
RASA_EXCEPTION_WRITE_KEY: ${{ secrets.RASA_OSS_EXCEPTION_WRITE_KEY }}
run: |
./scripts/write_keys_file.sh
- name: Build latest${{ matrix.image.tag_ext }} Docker image
run: docker build . --file ${{ matrix.image.file }} --tag rasa/rasa:latest${{ matrix.image.tag_ext }} --cache-from rasa/rasa:latest${{ matrix.image.tag_ext }}

Expand Down Expand Up @@ -253,6 +268,13 @@ jobs:
with:
poetry-version: ${{ env.POETRY_VERSION }}

- name: Copy Segment write key to the package
env:
RASA_TELEMETRY_WRITE_KEY: ${{ secrets.RASA_OSS_TELEMETRY_WRITE_KEY }}
RASA_EXCEPTION_WRITE_KEY: ${{ secrets.RASA_OSS_EXCEPTION_WRITE_KEY }}
run: |
./scripts/write_keys_file.sh
- name: Build ⚒️ Distributions
run: poetry build

Expand All @@ -262,6 +284,17 @@ jobs:
user: __token__
password: ${{ secrets.PYPI_TOKEN }}

- name: Notify Sentry about the release
env:
GITHUB_TAG: ${{ github.ref }}
SENTRY_ORG: rasahq
SENTRY_AUTH_TOKEN: ${{ secrets.SENTRY_AUTH_TOKEN }}
run: |
GITHUB_TAG=${GITHUB_TAG/refs\/tags\//}
sentry-cli releases new -p rasa-open-source "rasa-$GITHUB_TAG"
sentry-cli releases set-commits --auto "rasa-$GITHUB_TAG"
sentry-cli releases finalize "rasa-$GITHUB_TAG"
- name: Notify Slack & Publish Release Notes 🗞
env:
GH_RELEASE_NOTES_TOKEN: ${{ secrets.GH_RELEASE_NOTES_TOKEN }}
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,8 @@ docs/docs/variables.json
docs/docs/sources/
docs/docs/reference/
docs/docs/changelog.mdx
rasa/segment_key
rasa/keys

# Local Netlify folder
.netlify
4 changes: 4 additions & 0 deletions changelog/6613.improvement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Added telemetry reporting. Rasa uses telemetry to report anonymous usage information.
This information is essential to help improve Rasa Open Source for all users.
Reporting will be opt-out. More information can be found in our
[telemetry documentation](./telemetry/telemetry.mdx).
3 changes: 2 additions & 1 deletion docs/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
# Local Netlify folder
.netlify
.netlify
docs/telemetry/reference.mdx
7 changes: 7 additions & 0 deletions docs/docs/installation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,13 @@ without root privileges.
</TabItem>
</Tabs>

:::note Telemetry reporting
When you run Rasa Open Source for the first time, you’ll see a
message notifying you about anonymous usage data that is being collected.
You can read more about how that data is pulled out and what it is used for in the
[telemetry documentation](./telemetry/telemetry.mdx).
:::

**Congratulations! You have successfully installed Rasa Open Source!**

Next step: Start prototyping your first assistant online and download it afterwards
Expand Down
141 changes: 141 additions & 0 deletions docs/docs/telemetry/events.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
{
"sections": [
"Model Training",
"Miscellaneous"
],
"defaultSection": "Miscellaneous",
"events": {
"Telemetry Disabled": {
"description": "Triggered when telemetry reporting gets disabled. Last event sent before disabling telemetry. This event is not send, if the user never enabled telemetry reporting before deactivating it."
},
"Training Started": {
"description": "A training of a Rasa machine learning model got started. The event provides information on aggregated training data statistics.",
"type": "object",
"section": "Model Training",
"properties": {
"language": {
"type": "string",
"minLength": 1,
"description": "Language model is trained with, e.g. 'en'."
},
"training_id": {
"type": "string",
"minLength": 1,
"description": "Generated unique identifier for this training."
},
"model_type": {
"type": "string",
"description": "Type of model trained, either 'nlu', 'core' or 'rasa'."
},
"pipeline": {
"oneOf": [
{ "type": "string"},
{ "type": "array", "items": {"type": "object"}}
],
"description": "List of the pipeline configurations used for training."
},
"policies": {
"type": "array",
"items": {
"type": "object"
},
"description": "List of the policy configurations used for training."
},
"num_intent_examples": {
"type": "integer",
"description": "Number of NLU examples."
},
"num_entity_examples": {
"type": "integer",
"description": "Number of entity examples."
},
"num_actions": {
"type": "integer",
"description": "Number of actions defined in the domain."
},
"num_templates": {
"type": "integer",
"description": "Number of templates defined in the domain."
},
"num_slots": {
"type": "integer",
"description": "Number of slots defined in the domain."
},
"num_forms": {
"type": "integer",
"description": "Number of forms defined in the domain."
},
"num_intents": {
"type": "integer",
"description": "Number of intents defined in the domain."
},
"num_entities": {
"type": "integer",
"description": "Number of entities defined in the domain."
},
"num_story_steps": {
"type": "integer",
"description": "Number of story steps available."
},
"num_lookup_tables": {
"type": "integer",
"description": "Number of different lookup tables."
},
"num_synonyms": {
"type": "integer",
"description": "Total number of entity synonyms defined."
},
"num_regexes": {
"type": "integer",
"description": "Total number of regexes defined."
}
},
"additionalProperties": false,
"required": [
"language",
"training_id",
"model_type",
"pipeline",
"policies",
"num_intent_examples",
"num_entity_examples",
"num_actions",
"num_templates",
"num_slots",
"num_forms",
"num_intents",
"num_entities",
"num_story_steps",
"num_lookup_tables",
"num_synonyms",
"num_regexes"
]
},
"Training Completed": {
"description": "The training of a Rasa machine learning model finished. The event provides information about the resulting model.",
"type": "object",
"section": "Model Training",
"properties": {
"training_id": {
"type": "string",
"minLength": 1,
"description": "Generated unique identifier for this training. Can be used to join with 'Training Started'."
},
"model_type": {
"type": "string",
"description": "Type of model trained, either 'nlu', 'core' or 'rasa'."
},
"runtime": {
"type": "integer",
"description": "The time in seconds it took to train the model."
}
},
"additionalProperties": false,
"required": [
"training_id",
"model_type",
"runtime"
]
}
}
}
123 changes: 123 additions & 0 deletions docs/docs/telemetry/telemetry.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
---
id: telemetry
sidebar_label: Rasa Telemetry
title: Rasa Telemetry
abstract: |
Rasa uses telemetry to report anonymous usage information. This information
is essential to help improve Rasa Open Source for all users.
---

For the team working on Rasa Open Source it is important to understand
how the product is used. It allows us to properly prioritize our research
efforts and feature development.

You will be notified about the telemetry reporting when running Rasa Open Source
for the first time.

## How to opt-out

You can opt out of telemetry reporting at any time by running the command:
```bash
rasa telemetry disable
```

or by defining `RASA_TELEMETRY_ENABLED=false` as an environment variable.
If you want to enable reporting again, you can run:
```bash
rasa telemetry enable
```

## Why do we use telemetry reporting?

**Anonymous** telemetry data allow us to prioritize our research efforts
and feature development based on usage. We want to collect aggregated
information on usage and reliability so that we can ensure a high-quality product.

So how will we use the reported telemetry data? Here are some examples
of what we use the data for:

- We will be able to know which languages, pipelines and policies are used.
This will enable us to direct our research efforts towards text and
dialogue handling projects that will have the biggest impact for our users.
- We will be able to know data set sizes and general structure (e.g. the number
of intents). This allows us to better test our software on different types
of data sets and optimize the frameworks performance.
- We will be able to get more detail on the types of errors you are running
into while building an assistant (e.g. initialization, training, etc.).
This will let us improve the quality of our framework and better focus our
time on solving more common, frustrating issues.

## What about sensitive data?

Your sensitive data never leaves your machine. We:
- **don't** report any personal identifiable information
- **don't** report your training data
- **don't** report any messages your assistant receives or sends

:::note Inspect what is reported
You can view all the telemetry information that is reported
by defining the environment variable `RASA_TELEMETRY_DEBUG=true`, for example when running the train command:
```bash
RASA_TELEMETRY_DEBUG=true rasa train
```
When you set `RASA_TELEMETRY_DEBUG` no information will be sent to any server,
instead it will be logged to the commandline as a json dump for you to inspect.
:::

## What do we report?

Rasa reports aggregated usage details, command invocations, performance
measurements and errors.
We use the telemetry data to better understand usage patterns. The reported data
will directly allow us to better decide how to design future features
and prioritize current work.

Specifically, we collect the following information for all telemetry events:

- Type of the reported event (e.g. *Training Started*)
- Rasa machine ID: This is generated with a UUID and stored in the global Rasa
config at `~/.config/rasa/global.yml` and sent as `metrics_id`
- One-way hash of the current working directory or a hash of the git remote
- General OS level information (operating system, number of CPUs, number of
GPUs and whether the command is run inside a CI)
- Current Rasa Open Source and Python version

Here is an example report that shows the data reported to Rasa after running
`rasa train`:
```json
{
"userId": "38d23c36c9be443281196080fcdd707d",
"event": "Training Started",
"properties": {
"language": "en",
"num_intent_examples": 68,
"num_entity_examples": 0,
"num_actions": 17,
"num_templates": 6,
"num_slots": 0,
"num_forms": 0,
"num_intents": 6,
"num_entities": 0,
"num_story_steps": 5,
"num_lookup_tables": 0,
"num_synonyms": 0,
"num_regexes": 0,
"metrics_id": "38d23c36c9be443281196080fcdd707d"
},
"context": {
"os": {
"name": "Darwin",
"version": "19.4.0"
},
"ci": false,
"project": "a0a7178e6e5f9e6484c5cfa3ea4497ffc0c96d0ad3f3ad8e9399a1edd88e3cf4",
"python": "3.7.5",
"rasa_open_source": "2.0.0",
"gpu": 0,
"cpu": 16
}
}
```

We **cannot identify individual users** from the dataset. It is anonymized and
untraceable back to the user.
6 changes: 5 additions & 1 deletion docs/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"theme:link": "yarn link @rasahq/docusaurus-theme-tabula",
"theme:install": "yarn unlink @rasahq/docusaurus-theme-tabula && yarn install",
"theme:upgrade": "yarn upgrade @rasahq/docusaurus-theme-tabula --latest",
"pre-build": "yarn copy-md-files && yarn variables && yarn program-outputs && yarn included-sources && yarn autodoc",
"pre-build": "yarn copy-md-files && yarn variables && yarn program-outputs && yarn included-sources && yarn autodoc && yarn telemetry",
"start": "yarn pre-build && yarn develop",
"develop": "docusaurus start --port 3000",
"build": "docusaurus build",
Expand All @@ -16,6 +16,7 @@
"variables": "node scripts/compile_variables.js",
"program-outputs": "node scripts/compile_program_outputs.js",
"copy-md-files": "node scripts/copy_md_files.js",
"telemetry": "node scripts/compile_telemetry_reference.js --unhandled-rejections=strict",
"included-sources": "node scripts/compile_included_sources.js",
"autodoc": "echo 'Generating autodoc' && pydoc-markdown",
"clean": "find docs/sources -type f -not -name '.keep' -print0 | xargs -0 -I {} rm {}",
Expand Down Expand Up @@ -91,5 +92,8 @@
"netlify-cli": "^2.59.0",
"prettier": "^2.0.5",
"toml": "^3.0.0"
},
"telemetryReference": {
"outputPath": "./docs/telemetry/reference.mdx"
}
}
Loading

0 comments on commit 269bdaa

Please sign in to comment.