Closed
Description
Description of Problem: Some examples may be missed due to Rasa spliting algorithm.The issues is clearly depicted at forum thread
Overview of the Solution: rasa data split
does not give precise number of training samples.
Say overall we have X samples (x1 samples of label l1, x2 samples of label l2, …) and training-fraction
is 0.8.
(Note: x1 + x2 + … = X).
In the code of Rasa , number of training samples is A = int(0.8 * x1) + int(0.8 * x2) + …
Mathematically, A ≤ int(0.8 * X).
So number of missing samples is int(0.8 * X) - A