Possibility of incorporating RL

Reinforcement learning uses MLP architectures by design, hence in theory it should be possible to implement in OpenChem. Rewards tend to be passed into the model  as a loss via LogP or QED in literature. Would such an approach be feasible in OpenChem?

(Do RL myself, but no familiarity with molecular design)