Open
Description
What would you improve?
The way how model callbacks are integrated isn't optimal. It is "hidden" behind heuristics.
Also, since they are tightly bound to weak supervision runs, this might be the right point to think about how to best compare them. As a user, I want to be able to quickly compare my weak supervision and production model results.
This should be tightly bound to our UX redesign towards refinery 2.0