Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vo_dmabuf_wayland: reject formats not supported by the GPU #14815

Merged
merged 5 commits into from
Sep 16, 2024

Conversation

Dudemanguy
Copy link
Member

@Dudemanguy Dudemanguy commented Sep 8, 2024

You need bad hardware to actually test this. If you don't have bad drivers with poor format support, nothing should change except you might technically do an extra autoconvert step. Couldn't avoid that sorry but just use --hwdec anyway.

Basically my card is unable to correctly display vaapi frames from nv12 and other several common formats. On master, the behavior I get is red frames with broken colors (e.g.: #13098). This can be worked around by manually specifying a format version beforehand (e.g. --vf=format=bgr0 in my case) which will then display the correct colors since my GPU does do bgr0.

Edit: removed some dated stuff since this changed alot.

Copy link

github-actions bot commented Sep 8, 2024

Download the artifacts for this pull request:

Windows
macOS

@kasper93
Copy link
Contributor

kasper93 commented Sep 8, 2024

I didn't read the code yet, but from the commit message I couldn't grasp what exactly is the problem with current hwupload implementation. Could you explain what conversion is not currently possible and/or doesn't work? On example, will be better to understand for me.

Just to reiterate how hwupload currently work. (I've debugged some d3d11 upload issue recently). We start from the list of hw formats, for each of this formats we find software format that is supported to be uploaded and filter hwformat with VO supported ones. If hwupload fails, because we cannot upload current swformat to hwformat we fallback to swscale and upload supported format. In your case there is no possible swscale conversion that would be compatible with the hwupload?

For example in d3d11 case, we can only upload nv12. So it first convert yuv420p to nv12 with swscale and upload that.

[autoconvert] HW-uploading to d3d11
[autoconvert] Converting yuv420p -> nv12
[hwupload] upload nv12 -> d3d11[nv12]

@Dudemanguy
Copy link
Member Author

Dudemanguy commented Sep 8, 2024

Are you referring to vf_d3d11? If so, that works because the vf_d3d11 filter creates another autoconvert filter under the hood via mp_refqueue_alloc. This acts first and that is the part that does the yuv420p -> nv12 conversion. The normal autoconvert filter that is always created in filters/f_output_chain.c acts after this. It sees the nv12 that was just converted by the filter and then does the hwupload. So in short, vf_d3d11 is also doing a two pass conversion process with two different autoconvert filters.

@Dudemanguy
Copy link
Member Author

But on some further thought, a VO optionally requesting another user filter that gets created in player/video.c is probably a cleaner implementation than what I have here. I'll poke around on that a bit.

@kasper93
Copy link
Contributor

kasper93 commented Sep 8, 2024

Are you referring to vf_d3d11? If so, that works because the vf_d3d11 filter creates another autoconvert filter under the hood via mp_refqueue_alloc. This acts first and that is the part that does the yuv420p -> nv12 conversion. The normal autoconvert filter that is always created in filters/f_output_chain.c acts after this.

The normal autoconvert filter is not relevant or used. vf_d3d11vpp requires hardware frames and the frame is fully uploaded before it.

@Dudemanguy
Copy link
Member Author

Huh I guess the refqueue handles it somehow (can't really test). One autoconvert alone is not sufficient. i.e. I get this:

[autoconvert] HW-uploading to vaapi
[hwupload] upload yuv420p -> vaapi[bgr0]
VO: [dmabuf-wayland] 1280x720 vaapi[bgr0]
[vo/dmabuf-wayland] vaExportSurfaceHandle() failed (an unsupported memory type was supplied)
[vo/dmabuf-wayland] Unable to get drm format from hardware decoding!
Could not initialize video chain

When what I really want is this:

[autoconvert] Converting yuv420p -> bgr0
[autoconvert] HW-uploading to vaapi
[hwupload] upload bgr0 -> vaapi[bgr0]
VO: [dmabuf-wayland] 1280x720 vaapi[bgr0]
V: 00:00:00 / 00:00:31 (2%) DS: 2.5333/0

@kasper93
Copy link
Contributor

kasper93 commented Sep 8, 2024

[hwupload] upload yuv420p -> vaapi[bgr0]

Looks like the hwupload were successful, if this upload paths doesn't work, you need to exclude yuv420p from supported formats to be uploaded.

Take a look at --msg-level=hwupload=trace output. supports: line denotes what software formats are able to be uploaded for given hardware format. As reported by av_hwframe_transfer_get_formats() as I understand your hardware is not capable of uploading yuv420p -> vaapi[bgr0] directly, so either av_hwframe_transfer_get_formats() should be fixed to not report it as supported or we should exclude this format ourselves.

You could try adding skip in this loop

int index = p->num_fmts;
or rather
int fmt = pixfmt2imgfmt(fmts[i]);

exclude yuv420p from adding to supported formats.

@Dudemanguy
Copy link
Member Author

Dudemanguy commented Sep 8, 2024

It is already being excluded no?

[   0.213][d][hwupload] looking at format vaapi/nv12
[   0.214][d][hwupload]   supports: bgra rgba bgr0 rgb0 rgb30
[   0.214][d][hwupload]   not supported by VO: nv12 p010 yuv420p yuv420p yuyv422 uyvy422 gray yuv444p yuv440p argb
[   0.214][d][hwupload] looking at format vaapi/p010
[   0.214][d][hwupload]   supports: bgra rgba bgr0 rgb0 rgb30
[   0.214][d][hwupload]   not supported by VO: p010 nv12 yuv420p yuv420p yuyv422 uyvy422 gray yuv444p yuv440p argb
[   0.215][d][hwupload] looking at format vaapi/yuv420p
[   0.215][d][hwupload]   supports: bgra rgba bgr0 rgb0 rgb30
[   0.215][d][hwupload]   not supported by VO: yuv420p nv12 p010 yuyv422 uyvy422 gray yuv444p yuv440p argb
[   0.215][d][hwupload] looking at format vaapi/yuyv422
[   0.215][d][hwupload]   supports: bgra rgba bgr0 rgb0 rgb30
[   0.215][d][hwupload]   not supported by VO: yuyv422 nv12 p010 yuv420p yuv420p uyvy422 gray yuv444p yuv440p argb
[   0.215][d][hwupload] looking at format vaapi/uyvy422
[   0.216][d][hwupload]   supports: bgra rgba bgr0 rgb0 rgb30
[   0.216][d][hwupload]   not supported by VO: uyvy422 nv12 p010 yuv420p yuv420p yuyv422 gray yuv444p yuv440p argb
[   0.216][d][hwupload] looking at format vaapi/gray
[   0.216][d][hwupload]   supports: bgra rgba bgr0 rgb0 rgb30
[   0.216][d][hwupload]   not supported by VO: gray nv12 p010 yuv420p yuv420p yuyv422 uyvy422 yuv444p yuv440p argb
[   0.216][d][hwupload] looking at format vaapi/yuv444p
[   0.216][d][hwupload]   supports: bgra rgba bgr0 rgb0 rgb30
[   0.217][d][hwupload]   not supported by VO: yuv444p nv12 p010 yuv420p yuv420p yuyv422 uyvy422 gray yuv440p argb
[   0.217][d][hwupload] looking at format vaapi/yuv440p
[   0.217][d][hwupload]   supports: bgra rgba bgr0 rgb0 rgb30
[   0.217][d][hwupload]   not supported by VO: yuv440p nv12 p010 yuv420p yuv420p yuyv422 uyvy422 gray yuv444p argb
[   0.217][d][hwupload] looking at format vaapi/bgra
[   0.217][d][hwupload]   supports: bgra rgba bgr0 rgb0 rgb30
[   0.217][d][hwupload]   not supported by VO: nv12 p010 yuv420p yuv420p yuyv422 uyvy422 gray yuv444p yuv440p argb
[   0.217][d][hwupload] looking at format vaapi/rgba
[   0.218][d][hwupload]   supports: rgba bgra bgr0 rgb0 rgb30
[   0.218][d][hwupload]   not supported by VO: nv12 p010 yuv420p yuv420p yuyv422 uyvy422 gray yuv444p yuv440p argb
[   0.218][d][hwupload] looking at format vaapi/argb
[   0.218][d][hwupload]   supports: bgra rgba bgr0 rgb0 rgb30
[   0.218][d][hwupload]   not supported by VO: argb nv12 p010 yuv420p yuv420p yuyv422 uyvy422 gray yuv444p yuv440p
[   0.218][d][hwupload] looking at format vaapi/bgr0
[   0.218][d][hwupload]   supports: bgr0 bgra rgba rgb0 rgb30
[   0.219][d][hwupload]   not supported by VO: nv12 p010 yuv420p yuv420p yuyv422 uyvy422 gray yuv444p yuv440p argb
[   0.219][d][hwupload] looking at format vaapi/rgb0
[   0.219][d][hwupload]   supports: rgb0 bgra rgba bgr0 rgb30
[   0.219][d][hwupload]   not supported by VO: nv12 p010 yuv420p yuv420p yuyv422 uyvy422 gray yuv444p yuv440p argb
[   0.219][d][hwupload] looking at format vaapi/rgb30
[   0.219][d][hwupload]   supports: rgb30 bgra rgba bgr0 rgb0
[   0.219][d][hwupload]   not supported by VO: nv12 p010 yuv420p yuv420p yuyv422 uyvy422 gray yuv444p yuv440p argb

@philipl
Copy link
Member

philipl commented Sep 8, 2024

There is no standard way to filter formats based on what the vo supports today, so in that sense, you are trying to fix a valid limitation. We do support filtering based on supported ra formats, although we discussed this for dmabuf-Wayland before and there probably isn't a sane way to express the supported formats in terms of ra formats.

The issue here, I think, is vaapi lying about what formats it supports. It claims yuv420p and then the hwupload fails. If that was not claimed, it would probably auto convert correctly? I tried to make all of this smarter but it still relies on an accurate evaluation of vaapi capabilities,

@kasper93
Copy link
Contributor

kasper93 commented Sep 8, 2024

It is already being excluded no?

[   0.218][d][hwupload] looking at format vaapi/bgr0
[   0.218][d][hwupload]   supports: bgr0 bgra rgba rgb0 rgb30
[   0.219][d][hwupload]   not supported by VO: nv12 p010 yuv420p yuv420p yuyv422 uyvy422 gray yuv444p yuv440p arg

Indeed it looks correct. So why does it go through and do [hwupload] upload yuv420p -> vaapi[bgr0] which is clearly not in the supported set of formats. Something is fishy here.

Could you step through select_format() in debugger or add some more traces there and see what is selected... and why?

The issue here, I think, is vaapi lying about what formats it supports. It claims yuv420p and then the hwupload fails. If that was not claimed, it would probably auto convert correctly? I tried to make all of this smarter but it still relies on an accurate evaluation of vaapi capabilities,

Yeah, that's why I'm grilling DMG to figure out why it fails for him :)

@Dudemanguy Dudemanguy changed the title vo_dmabuf_wayland: implement a two-pass autoconvert and gpu format validation vo_dmabuf_wayland: reject formats not supported by the GPU Sep 8, 2024
@Dudemanguy
Copy link
Member Author

Dudemanguy commented Sep 8, 2024

So it turns out that f_hwtransfer just had a small bug where it should have been skipping formats but wasn't and that lead to my problems... oops. Should be a lot simpler now.

Edit: bug is real but the fix is not correct. So WIP still.

@kasper93
Copy link
Contributor

kasper93 commented Sep 8, 2024

To summarize my findings and thoughts from IRC:

I was able to reproduce issue with Hyprland, which are fixed by wayland changes in this PR. f_hwtransfer seems to be forcing CPU conversions to VO supported format, which in nominal case shouldn't be preferable for vo=dmabuf-wayland. I understand Dudemanguy has renaming issue on his hardware with uploading yuv420p directly and relying on hw conversion, but I think fixing that with f_hwtransfer change as current in this PR will negatively affect hardware that is working correctly.

On my test system sway/plasma seem to support nv12, so it just works. Hyprland on the other hand, is not supporting nv12 and before this PR, it would still use it and produce wrong colors. After this PR it correctly reject nv12 and tries to use bgr. But the f_hwtransfer change is a regression, because it forces CPU yuv420p -> bgr0 conversion as it is not supported on hardware in hwdec=yes case and it is forced before upload in hwdec=no case. Without f_hwtransfer change, everything works as expected.

master (f6d9313)

hyprland (wrong colors, red):
--vo=dmabuf-wayland --hwdec=no

hwupload: upload yuv420p -> vaapi[nv12]
VO: [dmabuf-wayland] 1920x1080 vaapi[nv12]

hyprland (wrong colors, red):
--vo=dmabuf-wayland --hwdec=yes

VO: [dmabuf-wayland] 1920x1080 vaapi[nv12]

sway/plasma:
--vo=dmabuf-wayland --hwdec=no

hwupload: upload yuv420p -> vaapi[yuv420p]
VO: [dmabuf-wayland] 1920x1080 vaapi[yuv420p]

sway/plasma:
--vo=dmabuf-wayland --hwdec=yes

VO: [dmabuf-wayland] 1920x1080 vaapi[nv12]

PR #14815

hyprland --vo=dmabuf-wayland --hwdec=no

  autoconvert: Converting yuv420p -> bgr0
     hwupload: upload bgr0 -> vaapi[bgr0]
      cplayer: VO: [dmabuf-wayland] 1920x1080 vaapi[bgr0]

hyprland --vo=dmabuf-wayland --hwdec=yes

     hwupload: Hardware conversion: scale_vaapi -> bgr0
  autoconvert: Converting vaapi[nv12] -> vaapi[bgr0]
       ffmpeg: filter: Hardware does not support output format bgr0.
       ffmpeg: filter: Failed to configure output pad on filter
        lavfi: failed to configure the filter graph
           vf: Cannot convert decoder/filter output to any format supported by the output.
           vd: Attempting next decoding method after failure of h264-vaapi.
      cplayer: AO: [pipewire] 48000Hz stereo 2ch floatp
  autoconvert: HW-uploading to vaapi
  autoconvert: Converting yuv420p -> bgr0
     hwupload: upload bgr0 -> vaapi[bgr0]
      cplayer: VO: [dmabuf-wayland] 1920x1080 vaapi[bgr0]

sway/plasma --vo=dmabuf-wayland --hwdec=no

     hwupload: upload yuv420p -> vaapi[nv12]
      cplayer: VO: [dmabuf-wayland] 1920x1080 vaapi[nv12]`

sway/plasma --vo=dmabuf-wayland --hwdec=yes

      cplayer: VO: [dmabuf-wayland] 1920x1080 vaapi[nv12]

PR #14815 (w/o f_hwtransfer change)

hyprland --vo=dmabuf-wayland --hwdec=no

     hwupload: upload yuv420p -> vaapi[bgr0]
      cplayer: VO: [dmabuf-wayland] 1920x1080 vaapi[bgr0]

hyprland --vo=dmabuf-wayland --hwdec=yes

     hwupload: Hardware conversion: scale_vaapi -> rgb0
  autoconvert: Converting vaapi[nv12] -> vaapi[rgb0]
      cplayer: AO: [pipewire] 48000Hz stereo 2ch floatp
      cplayer: VO: [dmabuf-wayland] 1920x1080 vaapi[rgb0]

sway/plasma --vo=dmabuf-wayland --hwdec=no

     hwupload: upload yuv420p -> vaapi[nv12]
      cplayer: VO: [dmabuf-wayland] 1920x1080 vaapi[nv12]  

sway/plasma --vo=dmabuf-wayland --hwdec=yes

      cplayer: VO: [dmabuf-wayland] 1920x1080 vaapi[nv12]

@Dudemanguy Dudemanguy marked this pull request as draft September 9, 2024 21:24
@llyyr
Copy link
Contributor

llyyr commented Sep 9, 2024

We could probably merge wayland: use tranche_formats when getting compositor formats right now no? It's pretty standalone from the rest and fixes a real issue.

@Dudemanguy
Copy link
Member Author

Dudemanguy commented Sep 9, 2024

In theory, the hwupload probing should work after I unbreak my crap soon here. I just had to push the branch to test on another machine hence the draft (so nobody wastes times looking at obviously wrong code).

Edit: Should be functional now.

@Dudemanguy Dudemanguy force-pushed the dmabuf-autoconvert branch 5 times, most recently from d2ffa23 to 48001c5 Compare September 10, 2024 00:24
@Dudemanguy Dudemanguy marked this pull request as ready for review September 10, 2024 00:25
@Dudemanguy Dudemanguy force-pushed the dmabuf-autoconvert branch 3 times, most recently from 8f0fa63 to 16da43c Compare September 10, 2024 00:37
@Dudemanguy Dudemanguy merged commit c0c57be into mpv-player:master Sep 16, 2024
24 checks passed
@Dudemanguy Dudemanguy deleted the dmabuf-autoconvert branch September 16, 2024 00:07
@na-na-hi
Copy link
Contributor

After this PR I found that it inserts an extra conversion, because nv12 isn't supported by the drm primary plane on Haswell igpu, even though it also worked before this PR without wrong colors. Both the compositor and vaExportSurfaceHandle support nv12.

Before:

[   0.051][d][hwupload]   supports: nv12 yuv420p yuyv422 uyvy422 rgb0 bgr0
[   0.051][d][hwupload]   not supported by VO: rgba bgra yuv422p
[   0.065][i][cplayer] VO: [dmabuf-wayland] 1280x720 vaapi[nv12]

After:

[   0.064][d][hwupload]   supports: rgb0 bgr0
[   0.064][d][hwupload]   not supported by VO: nv12 yuv420p yuyv422 uyvy422 rgba bgra yuv422p
[   0.064][v][hwupload] Hardware conversion: scale_vaapi -> rgb0
[   0.064][i][autoconvert] Converting vaapi[nv12] -> vaapi[rgb0]
[   0.082][i][cplayer] VO: [dmabuf-wayland] 1280x720 vaapi[rgb0]

@Dudemanguy
Copy link
Member Author

What does drm_info show on that machine?

@na-na-hi
Copy link
Contributor

Primary:

    │       ├───"IN_FORMATS" (immutable): blob = 32
    │       │   ├───I915_FORMAT_MOD_X_TILED (0x0100000000000001)
    │       │   │   ├───C8 (0x20203843)
    │       │   │   ├───RGB565 (0x36314752)
    │       │   │   ├───XRGB8888 (0x34325258)
    │       │   │   ├───XBGR8888 (0x34324258)
    │       │   │   ├───XRGB2101010 (0x30335258)
    │       │   │   ├───XBGR2101010 (0x30334258)
    │       │   │   └───XBGR16161616F (0x48344258)
    │       │   └───DRM_FORMAT_MOD_LINEAR (0x0000000000000000)
    │       │       ├───C8 (0x20203843)
    │       │       ├───RGB565 (0x36314752)
    │       │       ├───XRGB8888 (0x34325258)
    │       │       ├───XBGR8888 (0x34324258)
    │       │       ├───XRGB2101010 (0x30335258)
    │       │       ├───XBGR2101010 (0x30334258)
    │       │       └───XBGR16161616F (0x48344258)

Overlay:

    │       ├───"IN_FORMATS" (immutable): blob = 36
    │       │   ├───I915_FORMAT_MOD_X_TILED (0x0100000000000001)
    │       │   │   ├───XRGB8888 (0x34325258)
    │       │   │   ├───XBGR8888 (0x34324258)
    │       │   │   ├───XRGB2101010 (0x30335258)
    │       │   │   ├───XBGR2101010 (0x30334258)
    │       │   │   ├───XRGB16161616F (0x48345258)
    │       │   │   ├───XBGR16161616F (0x48344258)
    │       │   │   ├───YUYV (0x56595559)
    │       │   │   ├───YVYU (0x55595659)
    │       │   │   ├───UYVY (0x59565955)
    │       │   │   └───VYUY (0x59555956)
    │       │   └───DRM_FORMAT_MOD_LINEAR (0x0000000000000000)
    │       │       ├───XRGB8888 (0x34325258)
    │       │       ├───XBGR8888 (0x34324258)
    │       │       ├───XRGB2101010 (0x30335258)
    │       │       ├───XBGR2101010 (0x30334258)
    │       │       ├───XRGB16161616F (0x48345258)
    │       │       ├───XBGR16161616F (0x48344258)
    │       │       ├───YUYV (0x56595559)
    │       │       ├───YVYU (0x55595659)
    │       │       ├───UYVY (0x59565955)
    │       │       └───VYUY (0x59555956)

Cursor:

    │       ├───"IN_FORMATS" (immutable): blob = 42
    │       │   └───DRM_FORMAT_MOD_LINEAR (0x0000000000000000)
    │       │       └───ARGB8888 (0x34325241)

@Dudemanguy
Copy link
Member Author

Dudemanguy commented Sep 26, 2024

I'm not sure what we can do then unless someone has a better way of getting what formats will actually work via dmabuf.

@kasper93
Copy link
Contributor

@na-na-hi: Do you know which part of code is rejecting nv12 as supported now?

@Dudemanguy
Copy link
Member Author

It's the format probing via DRM for him since NV12 isn't listed in any of the planes. Note that vaExportSurfaceHandle on NV12 works just fine here on my machine but will end up with broken colors which was the whole motivation behind this PR in the first place.

@kasper93
Copy link
Contributor

kasper93 commented Sep 26, 2024

What if we use tranche_formats? Are they working for you without GPU format check or both were needed?

@Dudemanguy
Copy link
Member Author

Needs both. tranche_formats only sends what works on the compositor's side.

@kasper93
Copy link
Contributor

kasper93 commented Sep 26, 2024

Ok... @na-na-hi you are on your own in this case ¯\(ツ)

@Dudemanguy
Copy link
Member Author

I did not personally pinpoint where it happens but my assumption is that something under the hood on the driver level goes wrong when vo_dmabuf_wayland creates the buffer using the linux-dmabuf protocol. It's possible that for na-na-hi it actually wasn't really working but it just happened to land on some format that didn't completely break colors.

@philipl
Copy link
Member

philipl commented Sep 26, 2024

But if the planes don't declare native support, the compositor must be converting NV12, right? It can't be going through directly.

@Dudemanguy
Copy link
Member Author

Dudemanguy commented Sep 26, 2024

In my case, the "red colors bug" happened on all compositors no matter what so I thought it wasn't compositor-dependent but probably something in drm. Since mpv does specifically create the buffer with the NV12 drm format in this case I presume that's what all compositors treat it as. Of course, there is nothing stopping a compositor from doing some other conversion if it wanted.

@na-na-hi
Copy link
Contributor

na-na-hi commented Sep 26, 2024

TBH I'm not sure why it works on my hardware without conversion by mpv. It might be that the gpu driver/vaapi are doing some magic conversion behind the scene but I haven't looked at the code to confirm this. I have reverted locally for now, but it would be mostly placebo if this PR just moves that conversion to mpv and there isn't really an extra conversion overall.

@kasper93
Copy link
Contributor

But if the planes don't declare native support, the compositor must be converting NV12, right? It can't be going through directly.

Yeah, that's why I asked. If compositor accepts nv12 in tranche_formats, wouldn't it be its job to ensure it is displayed correctly and supported by GPU?

@philipl
Copy link
Member

philipl commented Sep 26, 2024

TBH I'm not sure why it works on my hardware without conversion by mpv. It might be that the gpu driver/vaapi are doing some magic conversion behind the scene but I haven't looked at the code to confirm this. I have reverted locally for now, but it would be mostly placebo if this PR just moves that conversion to mpv and there isn't really an extra conversion overall.

Yeah. Given that you're looking at old Haswell hardware, it's much more likely that the conversion is happening further down the pipeline than that there is some native support that is somehow missing or worse in newer hardware generations.

That said, you would almost certainly want to use scale_vaapi to fully control the YUV->RGB conversion and ensure the right colourspace is being used vs whatever implicit magic is happening in the nv12 -> bgr0 conversion. But in either case, the conversion is fully hardware offloaded.

@Dudemanguy
Copy link
Member Author

Yeah, that's why I asked. If compositor accepts nv12 in tranche_formats, wouldn't it be its job to ensure it is displayed correctly and supported by GPU?

Well the protocol here only states that it is formats "the compositor supports". Of course, nothing is stopping a compositor from saying that formats on a certain device is not supported and thus not send the format in question but in practice it doesn't seem anyone implements that way. But if they suddenly did, nanahi would still see the same change in behavior.

To be honest, it's probably better that compositors don't touch the buffer. The entire point of this VO is so mpv and the compositor can share the dmabuf and ideally it can be directly scanned out in the end so it can be zero-copy. The only thing I'd expect a compositor to possibly do is maybe handle tonemapping for hdr one day but that's still up in the air of course.

@kasper93
Copy link
Contributor

I agree. On this point maybe it would be good to revise our scale_vaapi filter options? Currently it sets only output format, which may not always be accurate. Those color options would be good to set https://github.com/FFmpeg/FFmpeg/blob/d103b61cd84990b0dfce52315b18fdeaffa64f05/libavfilter/vf_scale_vaapi.c#L239-L270 This would likely fix this #14849

To be honest, it's probably better that compositors don't touch the buffer. The entire point of this VO is so mpv and the compositor can share the dmabuf and ideally it can be directly scanned out

To double down on this I think we should also use scale_vaapi to scale video to output size, to avoid report about low quality scaling done by compositors. Settings those options https://github.com/FFmpeg/FFmpeg/blob/d103b61cd84990b0dfce52315b18fdeaffa64f05/libavfilter/vf_scale_vaapi.c#L220-L236 it even has quality levels, that could be exposed in config. This would likely fix issues like this #13745

@na-na-hi
Copy link
Contributor

Yeah. Given that you're looking at old Haswell hardware, it's much more likely that the conversion is happening further down the pipeline than that there is some native support that is somehow missing or worse in newer hardware generations.

That said, you would almost certainly want to use scale_vaapi to fully control the YUV->RGB conversion and ensure the right colourspace is being used vs whatever implicit magic is happening in the nv12 -> bgr0 conversion. But in either case, the conversion is fully hardware offloaded.

Thanks for your valuable input. I guess this points out some inconsistencies between driver implementations for this feature. (Side note: I've been playing with vaapi filters recently to enhance vo_dmabuf_wayland with vaapi "hq" scaler and equalizers, and am currently also looking at colorspace conversions.)

To double down on this I think we should also use scale_vaapi to scale video to output size, to avoid report about low quality scaling done by compositors.

From my experience with the "hq" quality setting for vaapi, this would be a valuable thing to have. It looks similar to lanczos for upscaling and downscaling has much less aliasing than naive bilinear, while at the same time it performs much better (avoids framedrops at large resolutions) and uses much less power on my setup compared to mpv's shader scalers. (~0.5W vs ~5W in GPU power according to intel_gpu_top)

@Dudemanguy
Copy link
Member Author

We could definitely integrate the vaapi filter some more into vo_dmabuf_wayland if appropriate. But a lot of things likely won't work on AMD because they don't bother to implement it.

@kasper93
Copy link
Contributor

kasper93 commented Sep 26, 2024

From my experience with the "hq" quality setting for vaapi, this would be a valuable thing to have. It looks similar to lanczos for upscaling and downscaling has much less aliasing than naive bilinear, while at the same time it performs much better (avoids framedrops at large resolutions) and uses much less power on my setup compared to mpv's shader scalers. (~0.5W vs ~5W in GPU power according to intel_gpu_top)

That would be consistent what can be observed on Windows too. If that mode would be available, I would switch to dmabuf myself, because it would be complete, soon we will get HDR support too.

@rhjdvsgsgks
Copy link

ra_compatible_format didnt considered the case which drm dont support the format but compositor support and did the conversion. eg https://invent.kde.org/plasma/kwin/-/merge_requests/6352 . thats the reason why i have only rgb30 in #14865 and #14849

@Dudemanguy
Copy link
Member Author

Dudemanguy commented Sep 28, 2024

Yes, the entire point of this change was to reject formats that weren't supported by drm. You got a format conversion and that's what we want. Note that your problem is hdr which is broken and has always been broken in vo_dmabuf_wayland.

@mahkoh
Copy link
Contributor

mahkoh commented Sep 29, 2024

As I explained before, mpv looking at primary planes is fundamentally broken if there is a compositor between mpv and the primary plane. The situation reported by @na-na-hi is one that I mentioned when you proposed this logic the first time.

mpv has no idea how the compositor is using the buffer. The compositor might want to put the buffer on an overlay plane or a primary plane. It might want to scan the buffer out on GPU A or GPU B. When the user has multiple GPUs this property will change at runtime.

The question, what formats are "supported by the GPU" is not well-posed in the first place since there are many different ways in which a GPU can support a format and mpv has no idea which one of those is most relevant in what situation or which is the most efficient in what situation. This situation can change depending on whether the mpv window is fullscreen or being scanned out directly. It can also change when the compositor moves the mpv window between displays.

If the purpose of vo_dmabuf_wayland is to be the most efficient, then the only way to do this is to use dmabuf_feedback at runtime. If the feedback changes at runtime, mpv must change its output format accordingly to guarantee presentation being optimal. Otherwise there is no way for mpv to implement this logic since mpv lacks critical information.

I'm not sure what we can do then unless someone has a better way of getting what formats will actually work via dmabuf.

If the purpose of this PR is to work around bugs in the graphics driver for your GPU, report that bug to the driver developers instead. If the purpose of this PR is to work around bugs in your compositor, report that bug to the compositor developers instead.

@mahkoh
Copy link
Contributor

mahkoh commented Sep 29, 2024

Yeah, that's why I asked. If compositor accepts nv12 in tranche_formats, wouldn't it be its job to ensure it is displayed correctly and supported by GPU?

Well the protocol here only states that it is formats "the compositor supports". Of course, nothing is stopping a compositor from saying that formats on a certain device is not supported and thus not send the format in question but in practice it doesn't seem anyone implements that way.

Every compositor implements it that way.

Every compositor contains a hard-coded list of formats it supports 1 2. This is required because different formats required different logic.

At runtime, it intersects that list with the formats supported by the graphics driver 3 4.

For direct scanout, it intersects that list additionally with the formats supported by hardware planes 5 6.

As you can see, this also involves intersecting modifiers. AFAIK, at least on Intel, only a subset of modifiers are supported for scanout and this subset might change depending on the number of connected displays due to bandwidth limitations. It is up to the compositor to detect this and tell the client to use different modifiers if it wants to support direct scanout in such situations. As far as I can tell, this PR does not look at modifiers at all.

@Dudemanguy
Copy link
Member Author

Dudemanguy commented Sep 29, 2024

Every compositor implements it that way.

I get exactly the same list of formats across multiple devices in wlroots with the default renderer and weston. There are different modifiers (which we do check against) but I never thought it actually filtered out formats or was supposed to. Now if looking at drm formats on plane is fundamentally wrong then I'm at a bit of a loss what to do. There are several formats I get from tranche_formats definitely don't work when uploading to hardware. My understanding of the protocol and the behavior seen by the compositors never lead me to conclude that the compositor was filtering these things for us. I had thought support on the planes was necessary for that.

As for multi-gpu/runtime stuff, the limitations are known and have always existed. Indeed it's on the todo list one day to make this better.

@mahkoh
Copy link
Contributor

mahkoh commented Sep 30, 2024

There are several formats I get from tranche_formats definitely don't work when uploading to hardware.

Report such bugs to the components that sit between you and the display. If this PR fixed the "monochrome video" issue on your setup, then your hardware likely does not support direct scanout for the affected format. In that case, the compositor is compositing and the bug is either that the compositor does not implement the required logic or the OpenGL driver does not do the required conversion correctly. In OpenGL, seeing an all-red video is a classic symptom of the shader not using samplerExternalOES. But at least wlroots should do this correctly and this is more likely a bug in the driver (or mpv or one of the libraries used by mpv.)

If you are using old hardware, it is possible that driver bugs are never going to get fixed and it might also be necessary to add an mpv flag to blacklist such formats. Or tell users not to use the VO on such hardware.

For a wayland client, using dmabuf_feedback is the one and only way to go when selecting formats and modifiers.

PS: When I had support for nv12 in my OpenGL renderer, mpv worked fine on my RX580 with vo_dmabuf_wayland. I've since removed support for such formats since I did not want to support them in my Vulkan renderer.

@Dudemanguy
Copy link
Member Author

I don't think anyone was doing anything wrong here except mpv. The compositor gives us format + modifier pairs, but it cannot know that none of them are going to work correctly for our purpose since we are drawing the image in a special way without using the usual graphics APIs. The drivers/hardware isn't bugged. It just lacked support for what we were trying to do. The old format selection code was not correct because it ignored this completely. My mistake with this PR was that I filtered them too much. This has since been corrected so that planar formats are only chosen as a fallback if the compositor supports the format and if there are no valid format + modifier pairs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants