Add validation to movie-lens benchmark #441
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The benchmark originally returned an empty set of recommended movies due to passing a userID of value 0 to the predict method. User ID 0 is undefined (as the model was not trained with ratings from this user), causing the predict method to return an empty set. This issue was resolved by retrieving the userID from the 'ratings-personal.csv' file.
The validation process involves two steps. First, it checks whether the specified number of top recommended movies contains a defined set of expected movies. Next, the validation compares the Root Mean Square Error (RMSE) achieved by the best model on the validation subset with the expected value.
The validation does not change the code that is being measured to a greater extent and thus should not really impact benchmark performance. As a quick check, I calculated an average duration from the last 5 repetitions of a single run of the benchmark before and after validation:
without validation: 6916.339ms
(movie-lens.no-validation.result.txt)
with validation: 6973.107ms
(movie-lens.with-validation.result.txt)
The variant with validation appears approx. 1% slower, but this is just from a single run. If necessary, I can do the comparison for multiple runs.