vo_gpu: vulkan: hwdec_cuda: Add support for Vulkan interop

Despite their place in the tree, hwdecs can be loaded and used just fine by the vulkan GPU backend. In this change we add Vulkan interop support to the cuda/nvdec hwdec. The overall process is mostly straight forward, so the main observation here is that I had to implement it using an intermediate Vulkan buffer because the direct VkImage usage is blocked by a bug in the nvidia driver. When that gets fixed, I will revist this. Nevertheless, the intermediate buffer copy is very cheap as it's all device memory from start to finish. Overall CPU utilisiation is pretty much the same as with the OpenGL GPU backend. Note that we cannot use a single intermediate buffer - rather there is a pool of them. This is done because the cuda memcpys are not explicitly synchronised with the texture uploads. In the basic case, this doesn't matter because the hwdec is not asked to map and copy the next frame until after the previous one is rendered. In the interpolation case, we need extra future frames available immediately, so we'll be asked to map/copy those frames and vulkan will be asked to render them. So far, harmless right? No. All the vulkan rendering, including the upload steps, are batched together and end up running very asynchronously from the CUDA copies. The end result is that all the copies happen one after another, and only then do the uploads happen, which means all textures are uploaded the same, final, frame data. Whoops. Unsurprisingly this results in the jerky motion because every 3/4 frames are identical. The buffer pool ensures that we do not overwrite a buffer that is still waiting to be uploaded. The ra_buf_pool implementation automatically checks if existing buffers are available for use and only creates a new one if it really has to. It's hard to say for sure what the maximum number of buffers might be but we believe it won't be so large as to make this strategy unusable. The highest I've seen is 12 when using interpolation with tscale=bicubic. A future optimisation here is to synchronise the CUDA copies with respect to the vulkan uploads. This can be done with shared semaphores that would ensure the copy of the second frames only happens after the upload of the first frame, and so on. This isn't trivial to implement as I'd have to first adjust the hwdec code to use asynchronous cuda; without that, there's no way to use the semaphore for synchronisation. This should result in fewer intermediate buffers being required.
author: Philip Langdale <philipl@overt.org> 2018-09-29 18:00:19 -0700
committer: sfan5 <sfan5@live.de> 2018-10-22 21:35:48 +0200
commit: da1073c247523d07d0485348447fcc02000afee8 (patch)
tree: 42034a476202e2e03d6dd06a705b3ef1ef95da32 /DOCS/man
parent: 621389134afd3026b7e3508dba070442c4eeefa0 (diff)
download: mpv-da1073c247523d07d0485348447fcc02000afee8.tar.bz2
mpv-da1073c247523d07d0485348447fcc02000afee8.tar.xz
1 files changed, 14 insertions, 5 deletions
diff --git a/DOCS/man/options.rst b/DOCS/man/options.rst
index 19f972a891..7b24ce3480 100644
--- a/DOCS/man/options.rst
+++ b/DOCS/man/options.rst
@@ -750,7 +750,7 @@ Video
     likely works with Intel GPUs only. It also requires the opengl EGL backend.
 
     The ``cuda`` and ``cuda-copy`` modes provides deinterlacing in the decoder
-    which is useful as there is no other deinterlacing mechanism in the opengl
+    which is useful as there is no other deinterlacing mechanism in the gpu
     output path. To use this deinterlacing you must pass the option:
     ``vd-lavc-o=deint=[weave|bob|adaptive]``.
     Pass ``weave`` (or leave the option unset) to not attempt any
@@ -779,6 +779,11 @@ Video
         codecs. See ``--hwdec-codecs`` to enable hardware decoding for more
         codecs.
 
+    .. note::
+
+       Most non-copy methods only work with the OpenGL GPU backend. Currently,
+       only the ``nvdec`` and ``cuda`` methods work with Vulkan.
+
     .. admonition:: Quality reduction with hardware decoding
 
         In theory, hardware decoding does not reduce video quality (at least
@@ -888,14 +893,18 @@ Video
     format, with likely no advantages.
 
 ``--cuda-decode-device=<auto|0..>``
-    Choose the GPU device used for decoding when using the ``cuda`` hwdec.
+    Choose the GPU device used for decoding when using the ``cuda`` or
+    ``nvdec`` hwdecs with the OpenGL GPU backend.
 
-    By default, the device that is being used to provide OpenGL output will
+    By default, the device that is being used to provide ``gpu`` output will
     also be used for decoding (and in the vast majority of cases, only one
     GPU will be present).
 
-    Note that when using the ``cuda-copy`` hwdec, a different option must be
-    passed: ``--vd-lavc-o=gpu=<0..>``.
+    Note that when using the ``cuda-copy`` or ``nvdec-copy`` hwdec, a
+    different option must be passed: ``--vd-lavc-o=gpu=<0..>``.
+
+    Note that this option is not available with the Vulkan GPU backend. With
+    Vulkan, decoding must always happen on the display device.
 
 ``--vaapi-device=<device file>``
     Choose the DRM device for ``vaapi-copy``. This should be the path to a
author	Philip Langdale <philipl@overt.org>	2018-09-29 18:00:19 -0700
committer	sfan5 <sfan5@live.de>	2018-10-22 21:35:48 +0200
commit	da1073c247523d07d0485348447fcc02000afee8 (patch)
tree	42034a476202e2e03d6dd06a705b3ef1ef95da32 /DOCS/man
parent	621389134afd3026b7e3508dba070442c4eeefa0 (diff)
download	mpv-da1073c247523d07d0485348447fcc02000afee8.tar.bz2 mpv-da1073c247523d07d0485348447fcc02000afee8.tar.xz