Since it is difficult to assess precision and recall directly at the MoA level, we should evaluate performance at a higher level. For example, when there are two or three replicates of the same treatment, we expect them to score similarly across all replicates. This implies that the average precision score should be relatively high, and that the buscar scores should remain consistent across replicates.