summaryrefslogtreecommitdiffstats
path: root/video/out/vulkan/ra_vk.c
Commit message (Collapse)AuthorAgeFilesLines
* vo_gpu: add RA_CAP for gl_NumWorkGroupsNiklas Haas2018-02-051-1/+1
| | | | | SPIRV-Cross doesn't support this for the time being. It's possible this could go away again at a later date.
* vo_gpu: vulkan: correctly enable textureGatherOffsetNiklas Haas2018-02-051-2/+2
| | | | This also requires a vulkan feature / SPIR-V capability to function
* vo_gpu: vulkan: don't issue queries for unused timersNiklas Haas2018-02-051-5/+13
| | | | | | | The vulkan validation layers warn you if you try requesting a query result from a timer that hasn't even been started yet, so we have to do some extra bit of work to keep track of which indices we've seen so far, and avoid the queries on them.
* vo_gpu: vulkan: add missing buffer barrier fieldsNiklas Haas2018-02-051-0/+2
| | | | These were accidentally omitted.
* vo_gpu: vulkan: fix some image barrier odditiesNiklas Haas2017-12-251-10/+5
| | | | | | | | | | | A vulkan validation layer update pointed out that this was wrong; we still need to use the access type corresponding to the stage mask, even if it means our code won't be able to skip the pipeline barrier (which would be wrong anyway). In additiona to this, we're also not allowed to specify any source access mask when transitioning from top_of_pipe, which doesn't make any sense anyway.
* vo_gpu: vulkan: fix dummyPass creationNiklas Haas2017-12-251-1/+1
| | | | This violates vulkan spec
* vo_gpu: vulkan: prefer vkCmdCopyImage over vkCmdBlitImageNiklas Haas2017-12-251-8/+31
| | | | | | blit() implies scaling, copy() is the equivalent command to use when the formats are compatible (same pixel size) and the rects have the same dimensions.
* vo_gpu: attempt re-using the FBO format for p->output_texNiklas Haas2017-12-251-0/+1
| | | | | | | | | This allows RAs with support for non-opaque FBO formats to use a more appropriate FBO format for the output tex, possibly enabling a more efficient blit operation. This requires distinguishing between real formats (which can be used to create textures) and fake formats (e.g. ra_gl's FBO hack).
* vo_gpu: vulkan: properly depend on the swapchain acquire semaphoreNiklas Haas2017-12-251-0/+13
| | | | | This is now associated with the ra_tex directly and used in the correct way, rather than hackily done from submit_frame.
* vo_gpu: vulkan: use correct access flag for presentNiklas Haas2017-12-251-2/+3
| | | | This needs VK_ACCESS_MEMORY_READ_BIT (spec)
* vo_gpu: aggressively prefer async computeNiklas Haas2017-12-251-1/+6
| | | | | | | | | | On AMD devices, we only get one graphics pipe but several compute pipes which can (in theory) run independently. As such, we should prefer compute shaders over fragment shaders in scenarios where we expect them to be better for parallelism. This is amusingly trivial to do, and actually improves performance even in a single-queue scenario.
* vo_gpu: vulkan: support split command poolsNiklas Haas2017-12-251-32/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of using a single primary queue, we generate multiple vk_cmdpools and pick the right one dynamically based on the intent. This has a number of immediate benefits: 1. We can use async texture uploads 2. We can use the DMA engine for buffer updates 3. We can benefit from async compute on AMD GPUs Unfortunately, the major downside is that due to the lack of QF ownership tracking, we need to use CONCURRENT sharing for all resources (buffers *and* images!). In theory, we could try figuring out a way to get rid of the concurrent sharing for buffers (which is only needed for compute shader UBOs), but even so, the concurrent sharing mode doesn't really seem to have a significant impact over here (nvidia). It's possible that other platforms may disagree. Our deadlock-avoidance strategy is stupidly simple: Just flush the command every time we need to switch queues, and make sure all submission and callbacks happen in FIFO order. This required lifting the cmds_pending and cmds_queued out from vk_cmdpool to mpvk_ctx, and some functions died/got moved as a result, but that's a relatively minor change. On my hardware this is a fairly significant performance boost, mainly due to async transfers. (Nvidia doesn't expose separate compute queues anyway). On AMD, this should be a performance boost as well due to async compute.
* vo_gpu: allow invalidating FBO in renderpass_runNiklas Haas2017-12-251-5/+15
| | | | | | | | | This is especially interesting for vulkan since it allows completely skipping the layout transition as part of the renderpass. Unfortunately, that also means it needs to be put into renderpass_params, as opposed to renderpass_run_params (unlike #4777). Closes #4777.
* vo_gpu: vulkan: properly track image dependenciesNiklas Haas2017-12-251-39/+117
| | | | | | | | | | | | | | | | | | | | | | | | This uses the new vk_signal mechanism to order all access to textures. This has several advantageS: 1. It allows real synchronization of image access across multiple frames when using multiple queues for parallelism. 2. It allows using events instead of pipeline barriers, which is a finer-grained synchronization primitive that allows for more efficient layout transitions over longer durations. This commit also restructures some of the implicit transition code for renderpasses to be more flexible and correct. (Note: this technically drops the ability to transition the image out of undefined layout when not blending, but that was a bug anyway and needs to be done properly) vo_gpu: vulkan: remove no-longer-true optimization The change to the output_tex format makes this no longer true, and it actually seems to hurt performance now as well. So just don't do it anymore. I also realized it hurts performance when drawing an OSD, so it's probably not a good idea anyway.
* vo_gpu: vulkan: refactor command submissionNiklas Haas2017-12-251-34/+10
| | | | | | | | | | | | | | | Instead of being submitted immediately, commands are appended into an internal submission queue, and the actual submission is done once per frame (at the same time as queue cycling). Again, the benefits are not immediately obvious because nothing benefits from this yet, but it will make more sense for an upcoming vk_signal mechanism. This also cleans up the way the ra_vk submission interacts with the synchronization/callbacks from the ra_vk_ctx. Although currently, the way the dependency is signalled is a bit hacky: normally it would be associated with the ra_tex itself and waited on in the appropriate stage implicitly. But that code is just temporary, so I'm keeping it in there for a better commit order.
* vo_gpu: vulkan: reorganize vk_cmd slightlyNiklas Haas2017-12-251-8/+7
| | | | | | | | Instead of associating a single VkSemaphore with every command buffer and allowing the user to ad-hoc wait on it during submission, make the raw semaphores-to-signal array work like the raw semaphores-to-wait-on array. Doesn't really provide a clear benefit yet, but it's required for upcoming modifications.
* vo_gpu: vulkan: refactor vk_cmdpoolNiklas Haas2017-12-251-1/+1
| | | | | | 1. No more static arrays (deps / callbacks / queues / cmds) 2. Allows safely recording multiple commands at the same time 3. Uses resources optimally by never over-allocating commands
* vo_gpu: d3d11: add RA caps for ra_d3d11James Ross-Gowan2017-11-071-1/+5
| | | | | | | | | | | | | | | | ra_d3d11 uses the SPIR-V compiler to translate GLSL to SPIR-V, which is then translated to HLSL. This means it always exposes the same GLSL version that the SPIR-V compiler supports (4.50 for shaderc/glslang.) Despite claiming to support GLSL 4.50, some features that are tied to the GLSL version in OpenGL are not supported by ra_d3d11 when targeting legacy Direct3D feature levels. This includes two features that mpv relies on: - Reading from gl_FragCoord in the fragment shader (requires FL 10_0) - textureGather from any texture component (requires FL 11_0) These features have been exposed as new RA caps.
* vo_gpu: export the GLSL format qualifier for ra_formatJames Ross-Gowan2017-11-071-0/+2
| | | | | | | | | | | | | | | Backported from @haasn's change to libplacebo, except in the current RA, there's nothing to indicate an ra_format can be bound as a storage image, so there's no way to force all of these formats to have a glsl_format. Instead, the layout qualifier will be removed if glsl_format is NULL. This is needed for the upcoming ra_d3d11 backend. In Direct3D 11, while loading float values from unorm images often works as expected, it's technically undefined behaviour, and in Windows 10, it will cause the debug layer to spam the log with error messages. Also, apparently in GLSL, the format name must match the image's format exactly (but in Direct3D, it just has to have the same component type.)
* vo_gpu: add namespace query mechanismJames Ross-Gowan2017-11-071-0/+6
| | | | | | Backported from @haasn's change to libplacebo. More flexible than the previous "shared || non-shared" distinction. The extra flexibility is needed for Direct3D 11, but it also doesn't hurt code-wise.
* vo_gpu: vulkan: normalize use of *Flags and *FlagBitsNiklas Haas2017-09-271-13/+12
| | | | | | | | | | | | FlagBits is just the name of the enum. The actual data type representing a combination of these flags follows the *Flags convention. (The relevant difference is that the latter is defined to be uint32_t instead of left implicit) For consistency, use *Flags everywhere instead of randomly switching between *Flags and *FlagBits. Also fix a wrong type name on `stageFlags`, pointed out by @atomnuker
* vo_gpu: vulkan: optimize redundant pipeline barriersNiklas Haas2017-09-261-5/+15
| | | | | | | | | | | | | | | Using renderpass layout transitions is more optimal and doesn't require a redundant pipeline barrier. Since our render passes are static and don't change throughout the lifetime of a ra_renderpass, we unfortunately don't have much flexibility here - so just hard-code SHADER_READ_ONLY_OPTIMAL as the output format as this will be the most common case. We also can't short-circuit the transition when we need to preserve the framebuffer contents, since that depends on the current layout; so we still use an explicit tex_barrier in this case. (Most optimal for this scenario would be an input attachment anyway)
* vo_gpu: vulkan: add support for push constantsNiklas Haas2017-09-261-0/+15
| | | | Can in theory avoid updating the uniform buffer every frame
* vo_gpu: vulkan: generalize SPIR-V compilerNiklas Haas2017-09-261-77/+198
| | | | | | | | | | | | | | In addition to the built-in nvidia compiler, we now also support a backend based on libshaderc. shaderc is sort of like glslang except it has a C API and is available as a dynamic library. The generated SPIR-V is now cached alongside the VkPipeline in the cached_program. We use a special cache header to ensure validity of this cache before passing it blindly to the vulkan implementation, since passing invalid SPIR-V can cause all sorts of nasty things. It's also designed to self-invalidate if the compiler gets better, by offering a catch-all `int compiler_version` that implementations can use as a cache invalidation marker.
* vo_gpu: vulkan: initial implementationNiklas Haas2017-09-261-0/+1590
This time based on ra/vo_gpu. 2017 is the year of the vulkan desktop! Current problems / limitations / improvement opportunities: 1. The swapchain/flipping code violates the vulkan spec, by assuming that the presentation queue will be bounded (in cases where rendering is significantly faster than vsync). But apparently, there's simply no better way to do this right now, to the point where even the stupid cube.c examples from LunarG etc. do it wrong. (cf. https://github.com/KhronosGroup/Vulkan-Docs/issues/370) 2. The memory allocator could be improved. (This is a universal constant) 3. Could explore using push descriptors instead of descriptor sets, especially since we expect to switch descriptors semi-often for some passes (like interpolation). Probably won't make a difference, but the synchronization overhead might be a factor. Who knows. 4. Parallelism across frames / async transfer is not well-defined, we either need to use a better semaphore / command buffer strategy or a resource pooling layer to safely handle cross-frame parallelism. (That said, I gave resource pooling a try and was not happy with the result at all - so I'm still exploring the semaphore strategy) 5. We aggressively use pipeline barriers where events would offer a much more fine-grained synchronization mechanism. As a result of this, we might be suffering from GPU bubbles due to too-short dependencies on objects. (That said, I'm also exploring the use of semaphores as a an ordering tactic which would allow cross-frame time slicing in theory) Some minor changes to the vo_gpu and infrastructure, but nothing consequential. NOTE: For safety, all use of asynchronous commands / multiple command pools is currently disabled completely. There are some left-over relics of this in the code (e.g. the distinction between dev_poll and pool_poll), but that is kept in place mostly because this will be re-extended in the future (vulkan rev 2). The queue count is also currently capped to 1, because of the lack of cross-frame semaphores means we need the implicit synchronization from the same-queue semantics to guarantee a correct result.