Skip to content

Rasa spliting algorithm does not give precise number of training samples. #6582

Closed
@duongkstn

Description

Description of Problem: Some examples may be missed due to Rasa spliting algorithm.The issues is clearly depicted at forum thread

Overview of the Solution: rasa data split does not give precise number of training samples.
Say overall we have X samples (x1 samples of label l1, x2 samples of label l2, …) and training-fraction is 0.8.
(Note: x1 + x2 + … = X).
In the code of Rasa , number of training samples is A = int(0.8 * x1) + int(0.8 * x2) + …
Mathematically, A ≤ int(0.8 * X).
So number of missing samples is int(0.8 * X) - A

Metadata

Assignees

Labels

area:rasa-oss 🎡Anything related to the open source Rasa frameworktype:bug 🐛Inconsistencies or issues which will cause an issue or problem for users or implementors.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions