diff options
author | Philip Langdale <philipl@overt.org> | 2018-09-29 18:00:19 -0700 |
---|---|---|
committer | sfan5 <sfan5@live.de> | 2018-10-22 21:35:48 +0200 |
commit | da1073c247523d07d0485348447fcc02000afee8 (patch) | |
tree | 42034a476202e2e03d6dd06a705b3ef1ef95da32 /DOCS/man | |
parent | 621389134afd3026b7e3508dba070442c4eeefa0 (diff) | |
download | mpv-da1073c247523d07d0485348447fcc02000afee8.tar.bz2 mpv-da1073c247523d07d0485348447fcc02000afee8.tar.xz |
vo_gpu: vulkan: hwdec_cuda: Add support for Vulkan interop
Despite their place in the tree, hwdecs can be loaded and used just
fine by the vulkan GPU backend.
In this change we add Vulkan interop support to the cuda/nvdec hwdec.
The overall process is mostly straight forward, so the main observation
here is that I had to implement it using an intermediate Vulkan buffer
because the direct VkImage usage is blocked by a bug in the nvidia
driver. When that gets fixed, I will revist this.
Nevertheless, the intermediate buffer copy is very cheap as it's all
device memory from start to finish. Overall CPU utilisiation is pretty
much the same as with the OpenGL GPU backend.
Note that we cannot use a single intermediate buffer - rather there
is a pool of them. This is done because the cuda memcpys are not
explicitly synchronised with the texture uploads.
In the basic case, this doesn't matter because the hwdec is not
asked to map and copy the next frame until after the previous one
is rendered. In the interpolation case, we need extra future frames
available immediately, so we'll be asked to map/copy those frames
and vulkan will be asked to render them. So far, harmless right? No.
All the vulkan rendering, including the upload steps, are batched
together and end up running very asynchronously from the CUDA copies.
The end result is that all the copies happen one after another, and
only then do the uploads happen, which means all textures are uploaded
the same, final, frame data. Whoops. Unsurprisingly this results in
the jerky motion because every 3/4 frames are identical.
The buffer pool ensures that we do not overwrite a buffer that is
still waiting to be uploaded. The ra_buf_pool implementation
automatically checks if existing buffers are available for use and
only creates a new one if it really has to. It's hard to say for sure
what the maximum number of buffers might be but we believe it won't
be so large as to make this strategy unusable. The highest I've seen
is 12 when using interpolation with tscale=bicubic.
A future optimisation here is to synchronise the CUDA copies with
respect to the vulkan uploads. This can be done with shared semaphores
that would ensure the copy of the second frames only happens after the
upload of the first frame, and so on. This isn't trivial to implement
as I'd have to first adjust the hwdec code to use asynchronous cuda;
without that, there's no way to use the semaphore for synchronisation.
This should result in fewer intermediate buffers being required.
Diffstat (limited to 'DOCS/man')
-rw-r--r-- | DOCS/man/options.rst | 19 |
1 files changed, 14 insertions, 5 deletions
diff --git a/DOCS/man/options.rst b/DOCS/man/options.rst index 19f972a891..7b24ce3480 100644 --- a/DOCS/man/options.rst +++ b/DOCS/man/options.rst @@ -750,7 +750,7 @@ Video likely works with Intel GPUs only. It also requires the opengl EGL backend. The ``cuda`` and ``cuda-copy`` modes provides deinterlacing in the decoder - which is useful as there is no other deinterlacing mechanism in the opengl + which is useful as there is no other deinterlacing mechanism in the gpu output path. To use this deinterlacing you must pass the option: ``vd-lavc-o=deint=[weave|bob|adaptive]``. Pass ``weave`` (or leave the option unset) to not attempt any @@ -779,6 +779,11 @@ Video codecs. See ``--hwdec-codecs`` to enable hardware decoding for more codecs. + .. note:: + + Most non-copy methods only work with the OpenGL GPU backend. Currently, + only the ``nvdec`` and ``cuda`` methods work with Vulkan. + .. admonition:: Quality reduction with hardware decoding In theory, hardware decoding does not reduce video quality (at least @@ -888,14 +893,18 @@ Video format, with likely no advantages. ``--cuda-decode-device=<auto|0..>`` - Choose the GPU device used for decoding when using the ``cuda`` hwdec. + Choose the GPU device used for decoding when using the ``cuda`` or + ``nvdec`` hwdecs with the OpenGL GPU backend. - By default, the device that is being used to provide OpenGL output will + By default, the device that is being used to provide ``gpu`` output will also be used for decoding (and in the vast majority of cases, only one GPU will be present). - Note that when using the ``cuda-copy`` hwdec, a different option must be - passed: ``--vd-lavc-o=gpu=<0..>``. + Note that when using the ``cuda-copy`` or ``nvdec-copy`` hwdec, a + different option must be passed: ``--vd-lavc-o=gpu=<0..>``. + + Note that this option is not available with the Vulkan GPU backend. With + Vulkan, decoding must always happen on the display device. ``--vaapi-device=<device file>`` Choose the DRM device for ``vaapi-copy``. This should be the path to a |