summaryrefslogtreecommitdiffstats
path: root/video/out/vulkan/context.c
Commit message (Collapse)AuthorAgeFilesLines
* wayland: only render if we have frame callbackDudemanguy2020-09-211-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Back in the olden days, mpv's wayland backend was driven by the frame callback. This had several issues and was removed in favor of the current approach which allowed some advanced features (like display-resample and presentation time) to actually work properly. However as a consequence, it meant that mpv always rendered, even if the surface was hidden. Wayland people consider this "wasteful" (and well they aren't wrong). This commit aims to avoid wasteful rendering by doing some additional checks in the swapchain. There's three main parts to this. 1. Wayland EGL now uses an external swapchain (like the drm context). Before we start a new frame, we check to see if we are waiting on a callback from the compositor. If there is no wait, then go ahead and proceed to render the frame, swap buffers, and then initiate vo_wayland_wait_frame to poll (with a timeout) for the next potential callback. If we are still waiting on callback from the compositor when starting a new frame, then we simple skip rendering it entirely until the surface comes back into view. 2. Wayland on vulkan has essentially the same approach although the details are a little different. The ra_vk_ctx does not have support for an external swapchain and although such a mechanism could theoretically be added, it doesn't make much sense with libplacebo. Instead, start_frame was added as a param and used to check for callback. 3. For wlshm, it's simply a matter of adding frame callback to it, leveraging vo_wayland_wait_frame, and using the frame callback value to whether or not to draw the image.
* vo_gpu: vulkan: add ability to disable eventsNiklas Haas2020-06-301-0/+5
| | | | | | | | | | | libplacebo exposes this feature already, because this particular type of bug is unusually common in practice. Simply make use of it, by exposing it as an option. Could probably also bump the libplacebo minimum version to get rid of the #if, but that would break debian oldoldstable or something. Fixes #7867.
* options: change option macros and all option declarationswm42020-03-181-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change all OPT_* macros such that they don't define the entire m_option initializer, and instead expand only to a part of it, which sets certain fields. This requires changing almost every option declaration, because they all use these macros. A declaration now always starts with {"name", ... followed by designated initializers only (possibly wrapped in macros). The OPT_* macros now initialize the .offset and .type fields only, sometimes also .priv and others. I think this change makes the option macros less tricky. The old code had to stuff everything into macro arguments (and attempted to allow setting arbitrary fields by letting the user pass designated initializers in the vararg parts). Some of this was made messy due to C99 and C11 not allowing 0-sized varargs with ',' removal. It's also possible that this change is pointless, other than cosmetic preferences. Not too happy about some things. For example, the OPT_CHOICE() indentation I applied looks a bit ugly. Much of this change was done with regex search&replace, but some places required manual editing. In particular, code in "obscure" areas (which I didn't include in compilation) might be broken now. In wayland_common.c the author of some option declarations confused the flags parameter with the default value (though the default value was also properly set below). I fixed this with this change.
* vo_gpu: vulkan: set allow_suboptimal when possibleNiklas Haas2019-12-221-0/+5
| | | | | This was added in libplacebo v1.29.0 and allows making resizes slightly smoother for clients that already handle resize events (such as mpv).
* wayland: fix presentation timeDudemanguy9112019-10-201-0/+9
| | | | | | | | | | There's 2 stupid things here that need to be fixed. First of all, vulkan wasn't actually using presentation time because somehow the get_vsync function in context.c disappeared. Secondly, if the mpv window was hidden it was updating the ust time based on the refresh_usec but really it should simply just not feed any information to the vsync info structure. So this adds some logic to assume whether or not a window is hidden.
* wayland: use callback flag + poll for buffer swapdudemanguy2019-10-101-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old way of using wayland in mpv relied on an external renderloop for semi-accurate timings. This had multiple issues though. Display sync would break whenever the window was hidden (since the frame callback stopped being executed) which was really annoying. Also the entire external renderloop logic was kind of fragile and didn't play well with mpv's internal structure (i.e. using presentation time in that old paradigm breaks stats.lua). Basically the problem is that swap buffers blocks on wayland which is crap whenever you hide the mpv window since it looks up the entire player. So you have to make swap buffers not block, but this has a different problem. Timings will be terrible if you use the unblocked swap buffers call. Based on some discussion in #wayland, the trick here is relatively simple and works well enough for our purposes. Instead we basically build a way to block with a timeout in the wayland buffer swap functions. A bool is set in the frame callback function that indicates whether or not mpv is waiting for a frame to be displayed. In the actual buffer swap function, we enter into a while loop waiting for this flag to be set. At the same time, the wl_display is polled to block the thread and wakeup if it receives any events from the compositor. This loop only breaks if enough time has passed or if the frame callback bool is received. In the near future, it is better to set whether or not frame a frame has been displayed in the presentation feedback. However as a first pass, doing it in the frame callback is more than good enough. The "downside" is that we render frames that aren't actually shown on screen when the player is hidden (it seems like wayland people don't like that). But who cares. Accurate timings are way more important. It's probably not too hard to add that behavior back in the player though.
* vo: make swapchain-depth option generic for all VOsAnton Kindestam2019-09-281-1/+1
| | | | In preparation for making vo_drm able to use swapchain-depth
* vo/gpu: vulkan: Pass the device name option through to libplaceboPhilip Langdale2019-08-241-0/+1
| | | | | | | | | We collect a 'vulkan-device' option today but then don't actually pass it on, so it's useless. Once that's fixed, it can be used to select a specific vulkan device by name. Tested with the new nvidia offload feature to select between the nvidia and intel GPUs.
* vo_gpu: vulkan: use libplacebo insteadNiklas Haas2019-04-211-414/+68
| | | | | | | | | | | | | | | | | | | | | | | | This commit rips out the entire mpv vulkan implementation in favor of exposing lightweight wrappers on top of libplacebo instead, which provides much of the same except in a more up-to-date and polished form. This (finally) unifies the code base between mpv and libplacebo, which is something I've been hoping to do for a long time. Note: The ra_pl wrappers are abstract enough from the actual libplacebo device type that we can in theory re-use them for other devices like d3d11 or even opengl in the future, so I moved them to a separate directory for the time being. However, the rest of the code is still vulkan-specific, so I've kept the "vulkan" naming and file paths, rather than introducing a new `--gpu-api` type. (Which would have been ended up with significantly more code duplicaiton) Plus, the code and functionality is similar enough that for most users this should just be a straight-up drop-in replacement. Note: This commit excludes some changes; specifically, the updates to context_win and hwdec_cuda are deferred to separate commits for authorship reasons.
* vulkan: slightly improve vsync jitter measurementsNiklas Haas2018-11-191-0/+19
| | | | | | | | | | | | | | | By design, some vulkan implementations block until vsync during vkAcquireNextImageKHR. Since mpv only considers the time that `swap_buffers` spent blocking as constituting part of the vsync, we can help it out a bit by pre-emptively calling this function here in order to improve the accuracy of vsync jitter measurements on vulkan. (If it fails, we just ignore the error and have the user call it a second time later - maybe it will work then) On my system this drops vsync-jitter from ~0.030 to ~0.007, an accuracy of +/- 100μs. (Which *might* have something to do with the fact that this is the polling interval for command polling)
* vo_gpu: vulkan: only rotate the queues on swapNiklas Haas2018-11-191-6/+9
| | | | | | | | Makes performance slightly better when using multiple queues by avoiding unnecessary semaphores due to bad queue selection. Also remove an aeons-old workaround for an nvidia bug that only ever existed in the earliest beta vulkan drivers anyway.
* vo_gpu: vulkan: suppress bogus error message on --vulkan-deviceNiklas Haas2018-10-211-5/+5
| | | | | | Since the code just broke out of the loop on a match rather than jumping straight to the end of the function body, it ended up hitting the code path for when the end of the list was reached.
* vo_gpu: remove old window screenshot glue code and GL implementationwm42018-02-131-1/+0
| | | | | | | | | | | There is now a better way. Reading the font framebuffer was always a hack. The new code via VOCTRL_SCREENSHOT renders it into a FBO, which does not come with the disadvantages of reading the front buffer (like not being supported by GLES, possibly black regions due to overlapping windows on some systems). For now keep VOCTRL_SCREENSHOT_WIN on the VO level, because there are still some lesser VOs and backends that use it.
* vo_gpu: vulkan: allow disabling async tf/compNiklas Haas2017-12-251-1/+8
| | | | | | | | | Async compute in particular seems to cause problems on some drivers, and even when supprted the benefits are not that massive from the tests I have seen, so it's probably safe to keep off by default. Async transfer on the other hand seems to work better and offers a more substantial improvement, so it's kept on.
* vo_gpu: vulkan: properly depend on the swapchain acquire semaphoreNiklas Haas2017-12-251-15/+8
| | | | | This is now associated with the ra_tex directly and used in the correct way, rather than hackily done from submit_frame.
* vo_gpu: vulkan: make the swapchain more robustNiklas Haas2017-12-251-23/+50
| | | | | Now handles both VK_ERROR_OUT_OF_DATE_KHR and VK_SUBOPTIMAL_KHR for both vkAcquireNextImageKHR and vkQueuePresentKHR in the correct way.
* vo_gpu: vulkan: support split command poolsNiklas Haas2017-12-251-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of using a single primary queue, we generate multiple vk_cmdpools and pick the right one dynamically based on the intent. This has a number of immediate benefits: 1. We can use async texture uploads 2. We can use the DMA engine for buffer updates 3. We can benefit from async compute on AMD GPUs Unfortunately, the major downside is that due to the lack of QF ownership tracking, we need to use CONCURRENT sharing for all resources (buffers *and* images!). In theory, we could try figuring out a way to get rid of the concurrent sharing for buffers (which is only needed for compute shader UBOs), but even so, the concurrent sharing mode doesn't really seem to have a significant impact over here (nvidia). It's possible that other platforms may disagree. Our deadlock-avoidance strategy is stupidly simple: Just flush the command every time we need to switch queues, and make sure all submission and callbacks happen in FIFO order. This required lifting the cmds_pending and cmds_queued out from vk_cmdpool to mpvk_ctx, and some functions died/got moved as a result, but that's a relatively minor change. On my hardware this is a fairly significant performance boost, mainly due to async transfers. (Nvidia doesn't expose separate compute queues anyway). On AMD, this should be a performance boost as well due to async compute.
* vo_gpu: vulkan: properly track image dependenciesNiklas Haas2017-12-251-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | This uses the new vk_signal mechanism to order all access to textures. This has several advantageS: 1. It allows real synchronization of image access across multiple frames when using multiple queues for parallelism. 2. It allows using events instead of pipeline barriers, which is a finer-grained synchronization primitive that allows for more efficient layout transitions over longer durations. This commit also restructures some of the implicit transition code for renderpasses to be more flexible and correct. (Note: this technically drops the ability to transition the image out of undefined layout when not blending, but that was a bug anyway and needs to be done properly) vo_gpu: vulkan: remove no-longer-true optimization The change to the output_tex format makes this no longer true, and it actually seems to hurt performance now as well. So just don't do it anymore. I also realized it hurts performance when drawing an OSD, so it's probably not a good idea anyway.
* vo_gpu: vulkan: refactor command submissionNiklas Haas2017-12-251-4/+23
| | | | | | | | | | | | | | | Instead of being submitted immediately, commands are appended into an internal submission queue, and the actual submission is done once per frame (at the same time as queue cycling). Again, the benefits are not immediately obvious because nothing benefits from this yet, but it will make more sense for an upcoming vk_signal mechanism. This also cleans up the way the ra_vk submission interacts with the synchronization/callbacks from the ra_vk_ctx. Although currently, the way the dependency is signalled is a bit hacky: normally it would be associated with the ra_tex itself and waited on in the appropriate stage implicitly. But that code is just temporary, so I'm keeping it in there for a better commit order.
* vo_gpu: vulkan: reorganize vk_cmd slightlyNiklas Haas2017-12-251-20/+25
| | | | | | | | Instead of associating a single VkSemaphore with every command buffer and allowing the user to ad-hoc wait on it during submission, make the raw semaphores-to-signal array work like the raw semaphores-to-wait-on array. Doesn't really provide a clear benefit yet, but it's required for upcoming modifications.
* vo_gpu: vulkan: refactor vk_cmdpoolNiklas Haas2017-12-251-6/+6
| | | | | | 1. No more static arrays (deps / callbacks / queues / cmds) 2. Allows safely recording multiple commands at the same time 3. Uses resources optimally by never over-allocating commands
* vo_gpu: vulkan: reword commentNiklas Haas2017-09-291-3/+4
| | | | | This is fixed upstream (and we now know it's a driver bug) so reword the comment.
* vo_gpu: vulkan: normalize use of *Flags and *FlagBitsNiklas Haas2017-09-271-2/+2
| | | | | | | | | | | | FlagBits is just the name of the enum. The actual data type representing a combination of these flags follows the *Flags convention. (The relevant difference is that the latter is defined to be uint32_t instead of left implicit) For consistency, use *Flags everywhere instead of randomly switching between *Flags and *FlagBits. Also fix a wrong type name on `stageFlags`, pointed out by @atomnuker
* vo_gpu: vulkan: generalize SPIR-V compilerNiklas Haas2017-09-261-0/+16
| | | | | | | | | | | | | | In addition to the built-in nvidia compiler, we now also support a backend based on libshaderc. shaderc is sort of like glslang except it has a C API and is available as a dynamic library. The generated SPIR-V is now cached alongside the VkPipeline in the cached_program. We use a special cache header to ensure validity of this cache before passing it blindly to the vulkan implementation, since passing invalid SPIR-V can cause all sorts of nasty things. It's also designed to self-invalidate if the compiler gets better, by offering a catch-all `int compiler_version` that implementations can use as a cache invalidation marker.
* vo_gpu: vulkan: initial implementationNiklas Haas2017-09-261-0/+501
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.