summaryrefslogtreecommitdiffstats
path: root/video/out/gpu
Commit message (Collapse)AuthorAgeFilesLines
* vo_gpu: d3d11: implement tex_download()James Ross-Gowan2018-02-132-83/+0
| | | | | | | This allows the new GPU screenshot functionality introduced in 9f595f3a80ee to work with the D3D11 backend. It replaces the old window screenshot functionality, which was shared between D3D11 and ANGLE. The old code can be removed, since it's not needed by ANGLE anymore either.
* vo_gpu: use a variable for the RA_CAP_FRAGCOORD flagJames Ross-Gowan2018-02-131-4/+3
| | | | | This is just a cosmetic change. Now the RA_CAP_FRAGCOORD check looks like all the others.
* vo_gpu: check for HDR peak detection in dumb mode tooJames Ross-Gowan2018-02-131-7/+8
| | | | | | | | Similar spirit to edb4970ca8b9. check_gl_features() has a confusing early-return. This also adds compute_hdr_peak to the list of options that is copied to the dumb-mode options struct, since it seems to make a difference. Otherwise it would be impossible to disable HDR peak detection in dumb mode.
* vo_gpu: make screenshots use the GL rendererwm42018-02-113-1/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | Using the GL renderer for color conversion will make sure screenshots will use the same conversion as normal video rendering. It can do this for all types of screenshots. The logic when to write 16 bit PNGs changes. To approximate the old behavior, we decide by looking whether the source video format has more than 8 bits per component. We apply this logic even for window screenshots. Also, 16 bit PNGs now always include an unused alpha channel. The reason is that FFmpeg has RGB48 and RGBA64 formats, but no RGB064. RGB48 is 3 bytes and usually not supported by GPUs for rendering, so we have to use RGBA64, which forces an alpha channel. Will break for users who use --target-trc and similar options. I considered creating a new gl_video context, but it could double GPU memory use, so I didn't. This uses FBOs instead of glGetTexImage(), because that increases the chance it could work on GLES (e.g. ANGLE). Untested. No support for the Vulkan and D3D11 backends yet. Fixes #5498. Also fixes #5240, because the code for reading back is not used with the new code path.
* vo_gpu: add internal ability to skip osd/subs for renderingwm42018-02-112-15/+37
| | | | Needed for the following commit.
* vo_gpu: use blit() only if target ra_tex supports itwm42018-02-111-2/+3
| | | | | Even if RA_CAP_BLIT is set, this might just not be enabled for the target ra_tex.
* vo_gpu: add memory barrier on the HDR peak detectionNiklas Haas2018-02-111-0/+1
| | | | | This can cause the peak detection state to be inconsistent in rare cases, which might explain the issues when taking screenshots in #5499.
* vo_gpu: correctly infer HDR peak detection supportNiklas Haas2018-02-111-1/+4
| | | | | | The re-ordering of commits e3d93fd and 0870859 ended up swallowing the change which made the HDR tone mapping algorithm actually check for RA_CAP_NUM_GROUPS support.
* vo_gpu: refactor HDR peak detection algorithmNiklas Haas2018-02-113-16/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The major changes are as follows: 1. Use `uint32_t` instead of `unsigned int` for the SSBO size calculation. This doesn't really matter, since a too-big buffer will still work just fine, but since `uint` is a 32-bit integer by definition this is the correct way to do it. 2. Pre-divide the frame_sum by the num_wg immediately at the end of a frame. This change was made to prevent overflow. At 4K screen size, this code is currently already very at risk of overflow, especially once I started playing with longer averaging sizes. Pre-dividing this out makes it just about fit into 32-bit even for worst-case PQ content. (It's technically also faster and easier this way, so I should have done it to begin with). Rename `frame_sum` to `frame_avg` to clearly signal the change in semantics. 3. Implement a scene transition detection algorithm. This basically compares the current frame's average brightness against the (averaged) value of the past frames. If it exceeds a threshold, which I experimentally configured, we reset the peak detection SSBO's state immediately - so that it just contains the current frame. This prevents annoying "eye adaptation"-like effects on scene transitions. 4. As a result of the previous change, we can now use a much larger buffer size by default, which results in a more stable and less flickery result. I experimented with values between 20 and 256 and settled on the new value of 64. (I also switched to a power-of-2 array size, because I like powers of two)
* vo_gpu: port HDR tone mapping algorithm from libplaceboNiklas Haas2018-02-053-70/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current peak detection algorithm was very bugged (which contributed to the excessive cross-frame flicker without long normalization) and also didn't take into account the frame average brightness level. The new algorithm both takes into account frame average brightness (in addition to peak brightness), and also computes the values in a more stable/correct way. (The old path was basically undefined behavior) In addition to improving the algorithm, we also switch to hable tone mapping by default, and try to enable peak computation automatically whever possible (compute shaders + SSBOs supported). We also make the desaturation milder, after extensive testing during libplacebo development. I also had to compensate a bit for the representational differences between mpv and libplacebo (libplacebo treats 1.0 as the reference peak, but mpv treats it as the nominal peak), but it shouldn't have caused any problems. This is still not quite the same as libplacebo, since libplacebo also allows tagging the desired scene average brightness on the output, and it also supports reading the scene average brightness from static metadata (MaxFALL) where available. But those changes are a bit more involved. It's possible we could also read this from metadata in the future, but we have problems communicating with AVFrames as it is and I don't want to touch the mpv colorimetry structs for the time being.
* vo_gpu: add RA_CAP for gl_NumWorkGroupsNiklas Haas2018-02-051-0/+1
| | | | | SPIRV-Cross doesn't support this for the time being. It's possible this could go away again at a later date.
* vo_gpu: check for RA_CAP_FRAGCOORD in dumb mode tooJames Ross-Gowan2018-01-301-13/+14
| | | | | | | The RA_CAP_FRAGCOORD checks apply to dumb mode as well, but they were after the check for dumb mode, which returns early, so they never ran. Fixes #5436
* video: fix crash with vdpau when reinitializing renderingwm42018-01-271-3/+3
| | | | | | | | | | Using vdpau will allocate additional textures for the reinterleaving step, which uninit_rendering() will free. This is a problem because the hwdec image remains mapped when reinitializing, so the reinterleaving textures are turned into dangling pointers. Fix this by freeing the reinterleave textures on full uninit instead. Fixes #5447.
* hwdec: detach d3d and d3d9 hwaccel from anglemyfreeer2018-01-251-1/+3
| | | | Fix https://github.com/mpv-player/mpv/issues/5420
* video: change some remaining vo_opengl mentions to vo_gpuAkemi2018-01-201-1/+1
|
* vo_gpu: skip DR for unsupported image formatswm42018-01-181-0/+3
| | | | | | | | | | | | | | DR (direct rendering) works by having the decoder decode into the GPU staging buffers, instead of copying the video data on texture upload. We did this even for formats unsupported by the GPU or the renderer. This "worked" because the staging memory is untyped, and the video frame was converted by libswscale to a supported format, and then uploaded with a copy using the normal non-DR texture upload path. Even though it "works", we don't gain anything from using the staging buffers for decoding, since we can't use them for upload anyway. Also, staging memory might be potentially limited (what really happens is up to the driver). It's easy to avoid, so just skip it in these cases.
* vo_gpu: fix broken 10 bit via integer textures playbackwm42018-01-171-3/+3
| | | | | | | | | | | The check_gl_features(p) call here checks whether dumb mode can be used. It uses the field use_integer_conversion, which is set _after_ the call in the same function. Move check_gl_features() to the end of the function, when use_integer_conversion is finally set. Fixes that it tried to use bilinear filtering with integer textures. The bug disabled the code that is supposed to convert it to non-integer textures.
* vo_gpu: hwdec_dxva2dxgi: initial implementationJames Ross-Gowan2018-01-061-0/+4
| | | | | | | | | | | | | This enables DXVA2 hardware decoding with ra_d3d11. It should be useful for Windows 7, where D3D11VA is not available. Images are transfered from D3D9 to D3D11 using D3D9Ex surface sharing[1]. Following Microsoft's recommendations, it uses a queue of shared surfaces, similar to Microsoft's ISurfaceQueue. This will hopefully prevent surface sharing from impacting parallelism and allow multiple D3D11 frames to be in-flight at once. [1]: https://msdn.microsoft.com/en-us/library/windows/desktop/ee913554.aspx
* vo_gpu: d3d11: don't use a bgra8 swapchainJames Ross-Gowan2018-01-041-19/+8
| | | | | | | | | | Previously, mpv would attempt to use a BGRA swapchain in the hope that it would give better performance, since the Windows desktop is also composited in BGRA. In practice, it seems like there is no noticable performance difference between RGBA and BGRA swapchains and BGRA swapchains cause trouble with a42b8b1142fd, which attempts to use the swapchain format for intermediate FBOs, even though D3D11 does not guarantee BGRA surfaces will work with UAV typed stores.
* vo_gpu: vulkan: omit needless #defineNiklas Haas2017-12-251-5/+0
|
* vo_gpu: attempt re-using the FBO format for p->output_texNiklas Haas2017-12-252-1/+9
| | | | | | | | | This allows RAs with support for non-opaque FBO formats to use a more appropriate FBO format for the output tex, possibly enabling a more efficient blit operation. This requires distinguishing between real formats (which can be used to create textures) and fake formats (e.g. ra_gl's FBO hack).
* vo_gpu: aggressively prefer async computeNiklas Haas2017-12-252-0/+6
| | | | | | | | | | On AMD devices, we only get one graphics pipe but several compute pipes which can (in theory) run independently. As such, we should prefer compute shaders over fragment shaders in scenarios where we expect them to be better for parallelism. This is amusingly trivial to do, and actually improves performance even in a single-queue scenario.
* vo_gpu: invalidate fbotex before drawingNiklas Haas2017-12-254-10/+11
| | | | | Don't discard the OSD or pass_draw_to_screen passes though. Could be faster on some hardware.
* vo_gpu: allow invalidating FBO in renderpass_runNiklas Haas2017-12-251-0/+3
| | | | | | | | | This is especially interesting for vulkan since it allows completely skipping the layout transition as part of the renderpass. Unfortunately, that also means it needs to be put into renderpass_params, as opposed to renderpass_run_params (unlike #4777). Closes #4777.
* msg: reinterpret a bunch of message levelsNiklas Haas2017-12-153-34/+17
| | | | | | | | | | | | | | | | | | | | | | I've decided that MP_TRACE means “noisy spam per frame”, whereas MP_DBG just means “more verbose debugging messages than MSGL_V”. Basically, MSGL_DBG shouldn't create spam per frame like it currently does, and MSGL_V should make sense to the end-user and provide mostly additional informational output. MP_DBG is basically what I want to make the new default for --log-file, so the cut-off point for MP_DBG is if we probably want to know if for debugging purposes but the user most likely doesn't care about on the terminal. Also, the debug callbacks for libass and ffmpeg got bumped in their verbosity levels slightly, because being external components they're a bit less relevant to mpv debugging, and a bit too over-eager in what they consider to be relevant information. I exclusively used the "try it on my machine and remove messages from MSGL_* until it does what I want it to" approach of refactoring, so YMMV.
* hwdec: document a forgotten parameterwm42017-12-111-0/+1
| | | | Add the "all" value to the --gpu-hwdec-interop help output.
* video: remove code duplication by calling a hwdec loader helperwm42017-12-111-2/+1
| | | | | Make gl_video_load_hwdecs() call gl_video_load_hwdecs_all() when all HW decoders should be loaded.
* video: properly initialize and set hwdec_interopwm42017-12-111-0/+2
| | | | Don't reset --gpu-hwdec-interop if vo_gpu uses dumb mode.
* vo_gpu: d3d11_helpers: use better formatting for PCI IDsJames Ross-Gowan2017-12-041-4/+6
| | | | | The old format was definitely misleading, since it used an 0x prefix and formatted the device IDs with %d.
* Fix various typos in log messagesNicolas F2017-12-031-1/+1
|
* vo_gpu: hwdec: remove redundant fieldswm42017-12-011-5/+2
| | | | | | | | | | | | | The testing_only field is not referenced anymore with vaglx removed and the previous commit dropping all uses. The ra_hwdec_driver.api field became unused with the previous commit, but all hwdec interop drivers still initialized it. Since this touches highly OS-specific code, build regressions are possible (plus the previous commit might break hw decoding at runtime). At least hwdec_cuda.c still used the .api field, other than initializing it.
* vo_gpu: make it possible to load multiple hwdec interop driverswm42017-12-014-98/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the VO<->decoder interface capable of supporting multiple hwdec APIs at once. The main gain is that this simplifies autoprobing a lot. Before this change, it could happen that the VO loaded the "wrong" hwdec API, and the decoder was stuck with the choice (breaking hw decoding). With the change applied, the VO simply loads all available APIs, so autoprobing trickery is left entirely to the decoder. In the past, we were quite careful about not accidentally loading the wrong interop drivers. This was in part to make sure autoprobing works, but also because libva had this obnoxious bug of dumping garbage to stderr when using the API. libva was fixed, so this is not a problem anymore. The --opengl-hwdec-interop option is changed in various ways (again...), and renamed to --gpu-hwdec-interop. It does not have much use anymore, other than debugging. It's notable that the order in the hwdec interop array ra_hwdec_drivers[] still matters if multiple drivers support the same image formats, so the option can explicitly force one, if that should ever be necessary, or more likely, for debugging. One example are the ra_hwdec_d3d11egl and ra_hwdec_d3d11eglrgb drivers, which both support d3d11 input. vo_gpu now always loads the interop lazily by default, but when it does, it loads them all. vo_opengl_cb now always loads them when the GL context handle is initialized. I don't expect that this causes any problems. It's now possible to do things like changing between vdpau and nvdec decoding at runtime. This is also preparation for cleaning up vd_lavc.c hwdec autoprobing. It's another reason why hwdec_devices_request_all() does not take a hwdec type anymore.
* vd_lavc: prefer nvdec over vdpau with --hwdec=autowm42017-11-301-3/+3
| | | | | | | | | | | | | | | | nvdec aka cuvid aka cuda should work much better than vdpau, and support newer codecs (such as vp9), and more advanced surface formats (like 10 bit). This requires moving the d3d hwaccels in the autoprobe order, since on Windows, d3d decoding should be preferred over nvidia proprietary stuff. Users of older drivers will need to force --hwdec=vdpau, since it could happen that the vo_gpu cuda hwdec interop loads (so the vdpau interop is not loaded), but the hwdec itself doesn't work. I expect this does not break AMD (which still needs vdpau for vo_gpu interop, until libva is fixed so it can fully support AMD).
* vo_gpu: remove hwdec_vaglx interopwm42017-11-301-3/+0
| | | | | | This has stopped being useful a long time ago, and it's the only GPL source file in the vo_gpu source directories. Recently it wasn't even loaded at all, unless you forced loading it.
* vo_gpu: d3d11_helpers: don't try BGRA_SUPPORTJames Ross-Gowan2017-11-191-20/+4
| | | | | | | | | The D3D11_CREATE_DEVICE_BGRA_SUPPORT flag doesn't enable support for BGRA textures. BGRA textures will be supported whether or not the flag is passed. The flag just fails device creation if they are not supported as an API convenience for programs that need BGRA textures, such as programs that use D2D or D3D9 interop. We can handle devices without BGRA support fine, so don't bother with the flag.
* vo_gpu: never pass flipped images to ra or ra backendswm42017-11-101-2/+7
| | | | | | | | | | | | | | | | | | | Seems like the last refactor to this code broke playing flipped images, at least with --opengl-pbo --gpu-api=opengl. Flipping is rather a shitmess. The main problem is that OpenGL does not support flipped uploading. The original vo_gl implementation considered it important to handle the flipped case efficiently, so instead of uploading the image line by line backwards, it uploaded it flipped, and then flipped it in the renderer (basically the upload path ignored the flipping). The ra code and backends probably have an insane and inconsistent mix of semantics, so fix this by never passing it flipped images in the first place. In the future, the backends should probably support flipped images directly. Fixes #5097.
* vo_gpu: move d3d11_screenshot to shared codeJames Ross-Gowan2017-11-072-0/+85
| | | | This can be used by the ANGLE backend and ra_d3d11.
* vo_gpu: d3d11: add RA caps for ra_d3d11James Ross-Gowan2017-11-074-5/+20
| | | | | | | | | | | | | | | | ra_d3d11 uses the SPIR-V compiler to translate GLSL to SPIR-V, which is then translated to HLSL. This means it always exposes the same GLSL version that the SPIR-V compiler supports (4.50 for shaderc/glslang.) Despite claiming to support GLSL 4.50, some features that are tied to the GLSL version in OpenGL are not supported by ra_d3d11 when targeting legacy Direct3D feature levels. This includes two features that mpv relies on: - Reading from gl_FragCoord in the fragment shader (requires FL 10_0) - textureGather from any texture component (requires FL 11_0) These features have been exposed as new RA caps.
* vo_gpu: d3d11: initial implementationJames Ross-Gowan2017-11-074-0/+491
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a new RA/vo_gpu backend that uses Direct3D 11. The GLSL generated by vo_gpu is cross-compiled to HLSL with SPIRV-Cross. What works: - All of mpv's internal shaders should work, including compute shaders. - Some external shaders have been tested and work, including RAVU and adaptive-sharpen. - Non-dumb mode works, even on very old hardware. Most features work at feature level 9_3 and all features work at feature level 10_0. Some features also work at feature level 9_1 and 9_2, but without high-bit- depth FBOs, it's not very useful. (Hardware this old is probably not fast enough for advanced features anyway.) Note: This is more compatible than ANGLE, which requires 9_3 to work at all (GLES 2.0,) and 10_1 for non-dumb-mode (GLES 3.0.) - Hardware decoding with D3D11VA, including decoding of 10-bit formats without truncation to 8-bit. What doesn't work / can be improved: - PBO upload and direct rendering does not work yet. Direct rendering requires persistent-mapped PBOs because the decoder needs to be able to read data from images that have already been decoded and uploaded. Unfortunately, it seems like persistent-mapped PBOs are fundamentally incompatible with D3D11, which requires all resources to use driver- managed memory and requires memory to be unmapped (and hence pointers to be invalidated) when a resource is used in a draw or copy operation. However it might be possible to use D3D11's limited multithreading capabilities to emulate some features of PBOs, like asynchronous texture uploading. - The blit() and clear() operations don't have equivalents in the D3D11 API that handle all cases, so in most cases, they have to be emulated with a shader. This is currently done inside ra_d3d11, but ideally it would be done in generic code, so it can take advantage of mpv's shader generation utilities. - SPIRV-Cross is used through a NIH C-compatible wrapper library, since it does not expose a C interface itself. The library is available here: https://github.com/rossy/crossc - The D3D11 context could be made to support more modern DXGI features in future. For example, it should be possible to add support for high-bit-depth and HDR output with DXGI 1.5/1.6.
* vo_gpu: export the GLSL format qualifier for ra_formatJames Ross-Gowan2017-11-073-14/+73
| | | | | | | | | | | | | | | Backported from @haasn's change to libplacebo, except in the current RA, there's nothing to indicate an ra_format can be bound as a storage image, so there's no way to force all of these formats to have a glsl_format. Instead, the layout qualifier will be removed if glsl_format is NULL. This is needed for the upcoming ra_d3d11 backend. In Direct3D 11, while loading float values from unorm images often works as expected, it's technically undefined behaviour, and in Windows 10, it will cause the debug layer to spam the log with error messages. Also, apparently in GLSL, the format name must match the image's format exactly (but in Direct3D, it just has to have the same component type.)
* vo_gpu: add namespace query mechanismJames Ross-Gowan2017-11-072-9/+9
| | | | | | Backported from @haasn's change to libplacebo. More flexible than the previous "shared || non-shared" distinction. The extra flexibility is needed for Direct3D 11, but it also doesn't hurt code-wise.
* vo_gpu: don't re-render hwdec frames when repeating frameswm42017-11-031-19/+11
| | | | | | | | | | | | | Repeating frames (for display-sync) is not supposed to render the entire frame again. When using hardware decoding, it unfortunately did: the renderer uses the frame ID to check whether the frame data changed, and unmapping the hwdec frame clears it. Essentially reverts commit 761eeacf5407cab07. Back then I probably thought it would be a good idea to release the hwdec image quickly in order to return it to the decoder, but they're referenced anyway. This should increase the performance and reduce GPU work.
* vo_gpu: fix mobius tone mapping compatibility to GLSL 120wm42017-11-031-1/+1
| | | | | | | | | Normally such code is didsabled by have_mglsl==false in check_gl_features(), but apparently not this one. Just fix it. Seems also more readable. Fixes #5069.
* vo_gpu: remove a redundant ifdefwm42017-10-301-2/+0
|
* vd_lavc: remove need for duplicated cuda GL interop backendwm42017-10-301-1/+0
| | | | | | | This is just a dumb consequence of HWDEC_ types somehow being part of both decoder and VO. Obviously, the VO should only care about supporting specific hardware surface types or providing specific device types, but until they are separated, stupid unintuitive mismatches will occur.
* vd_lavc: add support for nvdec hwaccelwm42017-10-281-0/+2
| | | | | | | | See manpage additions. (In ffmpeg-mpv and Libav, this is still called "cuvid". Libav won't work yet, because it has no frame params support yet, but this could get fixed soon.)
* vo_gpu: fix ra_tex_upload_pbo for 2D texturesNiklas Haas2017-10-271-1/+5
| | | | | | params->rc was ignored in the calculation for the buffer size. I fucking hate this stupid ra_tex_upload signature where *rc is randomly relevant or not.
* vo_gpu: osd: simplify some codewm42017-10-271-5/+5
| | | | | | Coverity complains about this, but it's probably a false positive. Anyway, rewrite it in a slightly more readable way. Now it's more obvious that it is correct.
* vo_gpu: change --tone-mapping-desaturate algorithmNiklas Haas2017-10-252-9/+11
| |