What optimization framework can this method be applied? What the difference with supervised learning, and Deep Reinforcement Learning?