Performance of grim -t png (under sway, but this seems compositor-independent) is much slower (~3x) than what I'm used to under X11 (scrot), and much much slower (~10x) than grim -t ppm.
Sample typical timings, on an old but serviceable desktop taking a 1920x1080 screenshot:
scrot -q 100 /tmp/out: 220420µs
grim -t ppm /tmp/out: 65673µs
grim -t png /tmp/out: 602272µs
These were obtained by wrapping the respective commands in taskset -c 0 time; without taskset -c 0 variance was much higher and grim -t png takes as long as 2.3s, which is not in accord with user expectations for taking a screenshot. 0.6s is pretty bad too.
This doesn't seem to be a fundamental issue with the PNG format, but instead caused by grim doing more compression than necessary (but less than dedicated PNG optimizers do). For example, this command:
grim -t ppm /dev/stdout | taskset -c 0 convert ppm:/dev/stdin -define png:compression-level=0 -define png:compression-filter=0 -define png:color-type=2 "${1:-/dev/stdout}"
usually runs in about 163000µs, measured the same way.
grim -t png spends almost all its runtime in cairo_surface_write_to_png, and most of that time in deflate. I don't see any way to configure cairo to do less compression in cairo_surface_write_to_png, but a reimplementation could change the options passed to libpng.
See flamegraph (pdf because GH doesn't allow uploading .svg).