Most existing models are fairly lightweight computationally; thus, the benchmarking numbers are slightly misleading since they are mostly dominated by various overheads in DynamicPPL rather than actual computation.
See, e.g., chalk-lab/Mooncake.jl#571 (comment) for an example of Mooncake's performance often being similar to Enzyme and better than compiled ReverseDiff.