Create CI check to detect training memory leaks

**Description of Problem**:
We recently identified a memory leak in the training of Rasa models.
This leak was on the main branch, but only detected when it got so bad due to a new [change](https://github.com/RasaHQ/rasa/pull/7438) that it was crashing CI test workers.
After much investigation the leak was narrowed to down to a bit of tensorflow code and [fixed](https://github.com/RasaHQ/rasa/pull/7817).
Ideally this memory leak would have been caught sooner, as it would have affected users.
The leak was apparent when either training with a high number of epochs, or training multiple times (like in the test suite).
We would like to have an automated check to test that we don't introduce another memory leak.

**Overview of the Solution**:
Tensorflow memory leaks can be hard to identify and fix as they often occur when the graph is being executed which could be in c code for example. This means the "leaking" variables are often not visible when looking at all the python objects in the interpreter. However you can identify if a memory leak exists by looking at the total memory usage of the process.
We used https://pypi.org/project/memory-profiler/ to track the memory usage of a python process when training the TED model to find the leak mentioned above.
This tool tracks the total memory usage over time and writes it to a file which can be parsed or plotted.
To use this in a automated fashion we could:
  - Create a test which trains a a model with dummy data but a high number of epochs
  - Run this python process wrapped in the profiler
  - Analyse the output to see the trend of the total memory usage

We could have a threshold that if crossed fails the test, e.g. 1GB

**Definition of Done**:
- [ ] The check can identify the bug that is mentioned in the description.
- [ ] The check works in the CI.
- [ ] The check works locally.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create CI check to detect training memory leaks #7827

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Create CI check to detect training memory leaks #7827

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions