summaryrefslogtreecommitdiffstats
path: root/video/out/gpu
Commit message (Collapse)AuthorAgeFilesLines
* vo_gpu: vulkan: Add support for exporting buffer memoryPhilip Langdale2018-10-221-0/+1
| | | | | | | | | | | | | | | | | | | | The CUDA/Vulkan interop works on the basis of memory being exported from Vulkan and then imported by CUDA. To enable this, we add a way to declare a buffer as being intended for export, and then add a function to do the export. For now, we support the fd and Handle based exports on Linux and Windows respectively. There are others, which we can support when a need arises. Also note that this is just for exporting buffers, rather than textures (VkImages). Image import on the CUDA side is supposed to work, but it is currently buggy and waiting for a new driver release. Finally, at least with my nvidia hardware and drivers, everything seems to work even if we don't initialise the buffer with the right exportability options. Nevertheless I'm enforcing it so that we're following the spec.
* vo_gpu: vulkan: fix strncpy truncation in spirv_compiler_initBtbN2018-10-211-1/+1
| | | | | | Fixes GCC8 warning ../video/out/gpu/spirv.c: In function 'spirv_compiler_init': ../video/out/gpu/spirv.c:68:9: warning: 'strncpy' specified bound 32 equals destination size [-Wstringop-truncation]
* vo_gpu: split --linear-scaling into two separate optionsNiklas Haas2018-10-192-13/+25
| | | | | | | | | | | | | | | | | | Since linear downscaling makes sense to handle independently from linear/sigmoid upscaling, we split this option up. Now, linear-downscaling is its own option that only controls linearization when downscaling and nothing more. Likewise, linear-upscaling / sigmoid-upscaling are two mutually exclusive options (the latter overriding the former) that apply only to upscaling and no longer implicitly enable linear light downscaling as well. The old behavior was very confusing, as evidenced by issues such as #6213. The current behavior should make much more sense, and only minimally breaks backwards compatibility (since using linear-scaling directly was very uncommon - most users got this for free as part of gpu-hq and relied only on that). Closes #6213.
* vo_gpu: fix vec3 packing in UBOs/push_constantsNiklas Haas2018-09-291-11/+13
| | | | | | | | | | | For vec3, the alignment and size differ. The current code will pack a struct like { vec3; float; vec2 } into 8 machine words, whereas the spec would only use 6. This actually fixes a real bug: The only place in the code I could find where it was conceivably possible that a vec3 is followed by a float was when using --gpu-dumb-mode in combination with --gamma-factor, and only when --gpu-api=vulkan. So it's no surprised nobody ran into it yet.
* vo_gpu: use explicit offsets for push constantsNiklas Haas2018-09-291-2/+1
| | | | | | | | | These used to be unsupported long ago, but it seems glslang added support in the meantime. (I don't know which version, but I'm guessing it was long enough ago that we don't have to add a feature check) Should hopefully help make push constant layouts more robust against possible bugs either in our code or in the driver.
* vo_gpu: adjust PRNG variant used by GL shaderssfan52018-09-261-1/+5
| | | | | | | | | | | Certain low-end Mali GPUs have a rather low precision and overflow during the PRNG calculations, thereby breaking e.g. deband-grain. Modify the permute() to avoid this, this does not impact the quality of PRNG output (noticeably). This problem was observed on: GL_VENDOR='ARM', GL_RENDERER='Mali-T720' GL_VERSION='OpenGL ES 3.1 v1.r15p0-00rel0.bdd9e62cdc8c88e0610a16b5901161e9'
* vo_gpu: switch to optimization level performanceNiklas Haas2018-09-011-1/+1
| | | | | | Upstream has this now. Didn't really make any different for me (except making the polar compute shader 2%-3% faster), but maybe it does for somebody else.
* vo_gpu: avoid overwriting compute shader block sizesNiklas Haas2018-08-261-4/+10
| | | | | | | | | | When using multiple compute shaders as part of the same pass, there can be a conflict in the block sizes. In the problematic case, the HDR detection shader can collide with the polar sampling shader. In this case, the solution is clear - the passes that can handle any size should "give in" and not overwrite the block sizes. Fixes #6083.
* wscript: split egl-android from androidTom Yan2018-08-201-1/+1
|
* gpu: prefer 16bit floating point FBO formats to 16bit integer onesJan Ekström2018-07-081-1/+1
| | | | | | According to earlier discussions, this can improve visual quality. This only changes the preferred order of the formats, not the formats themselves.
* vo_gpu: desaturate after peak detectionNiklas Haas2018-05-311-12/+12
| | | | | | | | | | This sacrifices some dynamic range for well-behaved sources, but prevents catastrophic desaturation on badly mastered / too bright sources. I think that's the better trade-off. This makes the desaturation algorithm much "safer" to deploy by default, as well. One could even argue going up to strength 1.0, which works better for some sources but worse for others. But I think the current strength is the best trade-off even after this change.
* vo_gpu: allow higher icc-contrast and improve loggingNiklas Haas2018-05-171-2/+3
| | | | | | | | | | With the advent of actual HDR devices, my real measured ICC profile has an "infinite" contrast, since the display is completely off on pure black inputs. 100k:1 might not be enough, so let's just bump it up to 1m:1 to be safe. Also, improve the logging in the case that the detected contrast is too high by default.
* drm/atomic: refactor hwdec_drmprime_drm with native resourcesLongChair2018-05-011-0/+9
| | | | | | | | | | | | | | | | | That new API was introduced and allows to have several native resources. Thisuses that mechanisma for drm resources rather than the deprecated opengl-cb structs. This patch therefore add two structs that can be used with the drm atomic interop. - mpv_opengl_drm_params : which will hold all the drm handles - mpv_opengl_drm_osd_size : which will hold osd layer size This commit adds a drm-osd-size=WxH parameter to commandline which allows to define the OSD plane dimension. OSD can be upscaled to screen resolution when having OSD at video resolution is too heavy. This is especially useful for UHD modes on embedded devices where the GPU cannot handle UHD modes at a decent framerate.
* vo_gpu/video: disable compute shaders if an FBO format was not availableJan Ekström2018-05-011-0/+5
| | | | | This is actually more generic and better than just lazily plastering peak calculation together with dumb mode.
* vo_gpu/video: add improved logging when a user-specified FBO failsJan Ekström2018-05-011-2/+13
| | | | | I don't know if we can just return from this function, so for now just adding this piece of logging.
* gpu/video: make HDR peak computing work without work group countNiklas Haas2018-04-291-4/+5
| | | | | | | | | | Define a hard-coded value for gl_NumWorkGroups if it is not available. This adds an additional requirement of needing a shader recompile for all window size changes. This was considered a worthwhile compromise as currently f.ex. d3d11 completely lacked any peak computation - this is a major quality of life upgrade.
* gpu/video: improve HDR peak computation feature check loggingJan Ekström2018-04-291-1/+4
| | | | | Now that the feature depends on multiple features, log all of their states in the message.
* video: remove internal stereo_out flagwm42018-04-291-2/+2
| | | | | | Also rename stereo3d to stereo_in. The only real change is that the vo_gpu OSD code now uses the actual stereo 3D mode, instead of the --video-steroe-mode value. (Why does this vo_gpu code even exist?)
* vo_libmpv: support GPU rendered screenshotswm42018-04-291-0/+9
| | | | | Like DR, this needed a lot of preparation, and here's the boring glue code that finally implements it.
* vo_libmpv: add support for DRwm42018-04-291-0/+9
| | | | | With all the preparation work done, this only has to do the annoying dance of passing it through all the damn layers.
* vo_gpu: move some extra code for screenshot to video.cwm42018-04-201-5/+14
| | | | | This also happens to fix some UB on the error path (target being declared after the first "goto done;").
* client API: clarify that Display pointers etc. need to stay validwm42018-04-161-6/+21
| | | | | | | | | | | | Normally, MPV_RENDER_PARAM* arguments are copied, unless documented otherwise. Of course we can't copy X11 Display or Wayland wl_display types, but for arguments that are "summarized" in a struct (like MPV_RENDER_PARAM_OPENGL_FBO), a copy is expected. Also add some unused infrastructure to make this explicit, and to make it easier to add parameter types that require a copy. Untested.
* client API: add a new way to pass X11 Display etc. to render APIwm42018-03-264-0/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hardware decoding things often need access to additional handles from the windowing system, such as the X11 or Wayland display when using vaapi. The opengl-cb had nothing dedicated for this, and used the weird GL_MP_MPGetNativeDisplay GL extension (which was mpv specific and not officially registered with OpenGL). This was awkward, and a pain due to having to emulate GL context behavior (like needing a TLS variable to store context for the pseudo GL extension function). In addition (and not inherently due to this), we could pass only one resource from mpv builtin context backends to hwdecs. It was also all GL specific. Replace this with a newer mechanism. It works for all RA backends, not just GL. the API user can explicitly pass the objects at init time via mpv_render_context_create(). Multiple resources are naturally possible. The API uses MPV_RENDER_PARAM_* defines, but internally we use strings. This is done for 2 reasons: 1. trying to leave libmpv and internal mechanisms decoupled, 2. not having to add public API for some of the internal resource types (especially D3D/GL interop stuff). To remain sane, drop support for obscure half-working opengl-cb things, like the DRM interop (was missing necessary things), the RPI window thing (nobody used it), and obscure D3D interop things (not needed with ANGLE, others were undocumented). In order not to break ABI and the C API, we don't remove the associated structs from opengl_cb.h. The parts which are still needed (in particular DRM interop) needs to be ported to the render API.
* vo_gpu: fix anamorphic video screenshots (second try)wm42018-03-161-3/+4
| | | | | | | | | | | | | | This passed the display size as source size to the renderer, which is of course nonsense. I don't know what I was doing in 569383bc54. Yet another fix for those damn anamorphic videos. As a somewhat redundant/cosmetic change, use image_params instead of real_image_params in the code above. They should have the same, dimensions (but possibly different formats when doing hw decdoing), and mixing them is confusing. p->image_params wins because it's shorter. Actually fixes #5619.
* vo_gpu: fix anamorphic screenshotswm42018-03-151-2/+2
| | | | | | | | | | | We took the storage size instead of the display size for "unscaled" screenshots. Even if it's called "unscaled", it's still supposed to scale to compensate for aspect ratio. (How many commits fixing anamorphic screenshots in various situations are there?) Fixes #5619.
* vo_gpu: error out if there were rendering errors when taking screenshotwm42018-03-031-1/+5
|
* vo_gpu: fix taking screenshots of rotated videoswm42018-03-031-0/+3
| | | | Good old 90° rotation logic messing everything up.
* client API: deprecate opengl-cb API and introduce a replacement APIwm42018-02-282-0/+220
| | | | | | | | | | | | | | | | | | | | | | | | | The purpose of the new API is to make it useable with other APIs than OpenGL, especially D3D11 and vulkan. In theory it's now possible to support other vo_gpu backends, as well as backends that don't use the vo_gpu code at all. This also aims to get rid of the dumb mpv_get_sub_api() function. The life cycle of the new mpv_render_context is a bit different from mpv_opengl_cb_context, and you explicitly create/destroy the new context, instead of calling init/uninit on an object returned by mpv_get_sub_api(). In other to make the render API generic, it's annoyingly EGL style, and requires you to pass in API-specific objects to generic functions. This is to avoid explicit objects like the internal ra API has, because that sounds more complicated and annoying for an API that's supposed to never change. The opengl_cb API will continue to exist for a bit longer, but internally there are already a few tradeoffs, like reduced thread-safety. Mostly untested. Seems to work fine with mpc-qt.
* vo_gpu: remove a dead declarationwm42018-02-281-1/+0
|
* vo_gpu: fix mobius tone mapping when sig_peak <= 1.0Niklas Haas2018-02-251-0/+2
| | | | | | | Mobius isn't well-defined for sig_peak <= 1.0. We can solve this by just soft-clamping sig_peak to 1.0. Although, in this case, we can just skip tone mapping altogether since the limit of mobius as sig_peak -> 1.0 is just a linear function.
* vo_gpu: don't tone-map for pure gamut reductionsNiklas Haas2018-02-251-3/+0
| | | | | | | | | Based on testing with real-world non-HDR BT.2020 clips, clipping the color space looks better than attempting to gamut map using a tone mapping shader that's (by now) optimized for HDR content. If anything, we'd have to develop a separate gamut mapping shader that works in LCh space.
* vo_gpu: introduce --target-peakNiklas Haas2018-02-203-7/+16
| | | | | | | | | | | | | This solves a number of problems simultaneously: 1. When outputting HLG, this allows tuning the OOTF based on the display characteristics. 2. When outputting PQ or other HDR curves, this allows soft-limiting the output brightness using the tone mapping algorithm. 3. When outputting SDR, this allows HDR-in-SDR style output, by controlling the output brightness directly. Closes #5521
* vo_gpu: correctly parametrize the HLG OOTF by the display peakNiklas Haas2018-02-201-20/+31
| | | | | | | | | | | | | | The HLG OOTF is defined as a one-parameter family of OOTFs depending on the display's peak luminance. With the preceding change to OOTF scale and handling, we no longer have any issues with outputting values in whatever signal range we need. So as a result, it's easy for us to support a tunable OOTF which may (drastically) alter the display brightness. In fact, this is also the only correct way to do it, because the HLG appearance depends strongly on the OOTF configuration. For the OOTF, we consult the mastering display's tagging (via src.sig_peak). For the inverse OOTF, we consult the output display's target peak.
* vo_gpu: simplify and correct color scale handlingNiklas Haas2018-02-203-33/+40
| | | | | | | | | | | | | | | | | | | | | | | | | The primary need for this change is the fact that the OOTF was incorrectly scaled, due to the fact that the application of the OOTF can itself change the required normalization peak. (Plus, an oversight in pass_inverse_ootf meant we forgot to normalize at the end of it) The linearize/delinearize functions still normalize the scale since it's used in a number of places throughout gpu/video.c, but the color management function now converts to absolute scale right away, instead of in an awkward way inside the tone mapping branch. The OOTF functions now work in absolute scale only. In addition, minor changes have been made to the way normalization is handled for tone mapping - we now divide out the dst_peak *after* peak detection, in order to make the scale of the peak detection buffer consistent even if the dst_peak were to (hypothetically) change mid-stream. In theory, we could also do this for desaturation, but doing the desaturation before tone mapping has the advantage of preserving much more brightness than the other way around - and even mid-stream changes are not that drastic here. Finally, some preparation work has been done for allowing the user to customize the `dst.sig_peak` in the future.
* vo_gpu: remove old window screenshot glue code and GL implementationwm42018-02-131-3/+0
| | | | | | | | | | | There is now a better way. Reading the font framebuffer was always a hack. The new code via VOCTRL_SCREENSHOT renders it into a FBO, which does not come with the disadvantages of reading the front buffer (like not being supported by GLES, possibly black regions due to overlapping windows on some systems). For now keep VOCTRL_SCREENSHOT_WIN on the VO level, because there are still some lesser VOs and backends that use it.
* vo_gpu: d3d11: implement tex_download()James Ross-Gowan2018-02-132-83/+0
| | | | | | | This allows the new GPU screenshot functionality introduced in 9f595f3a80ee to work with the D3D11 backend. It replaces the old window screenshot functionality, which was shared between D3D11 and ANGLE. The old code can be removed, since it's not needed by ANGLE anymore either.
* vo_gpu: use a variable for the RA_CAP_FRAGCOORD flagJames Ross-Gowan2018-02-131-4/+3
| | | | | This is just a cosmetic change. Now the RA_CAP_FRAGCOORD check looks like all the others.
* vo_gpu: check for HDR peak detection in dumb mode tooJames Ross-Gowan2018-02-131-7/+8
| | | | | | | | Similar spirit to edb4970ca8b9. check_gl_features() has a confusing early-return. This also adds compute_hdr_peak to the list of options that is copied to the dumb-mode options struct, since it seems to make a difference. Otherwise it would be impossible to disable HDR peak detection in dumb mode.
* vo_gpu: make screenshots use the GL rendererwm42018-02-113-1/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | Using the GL renderer for color conversion will make sure screenshots will use the same conversion as normal video rendering. It can do this for all types of screenshots. The logic when to write 16 bit PNGs changes. To approximate the old behavior, we decide by looking whether the source video format has more than 8 bits per component. We apply this logic even for window screenshots. Also, 16 bit PNGs now always include an unused alpha channel. The reason is that FFmpeg has RGB48 and RGBA64 formats, but no RGB064. RGB48 is 3 bytes and usually not supported by GPUs for rendering, so we have to use RGBA64, which forces an alpha channel. Will break for users who use --target-trc and similar options. I considered creating a new gl_video context, but it could double GPU memory use, so I didn't. This uses FBOs instead of glGetTexImage(), because that increases the chance it could work on GLES (e.g. ANGLE). Untested. No support for the Vulkan and D3D11 backends yet. Fixes #5498. Also fixes #5240, because the code for reading back is not used with the new code path.
* vo_gpu: add internal ability to skip osd/subs for renderingwm42018-02-112-15/+37
| | | | Needed for the following commit.
* vo_gpu: use blit() only if target ra_tex supports itwm42018-02-111-2/+3
| | | | | Even if RA_CAP_BLIT is set, this might just not be enabled for the target ra_tex.
* vo_gpu: add memory barrier on the HDR peak detectionNiklas Haas2018-02-111-0/+1
| | | | | This can cause the peak detection state to be inconsistent in rare cases, which might explain the issues when taking screenshots in #5499.
* vo_gpu: correctly infer HDR peak detection supportNiklas Haas2018-02-111-1/+4
| | | | | | The re-ordering of commits e3d93fd and 0870859 ended up swallowing the change which made the HDR tone mapping algorithm actually check for RA_CAP_NUM_GROUPS support.
* vo_gpu: refactor HDR peak detection algorithmNiklas Haas2018-02-113-16/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The major changes are as follows: 1. Use `uint32_t` instead of `unsigned int` for the SSBO size calculation. This doesn't really matter, since a too-big buffer will still work just fine, but since `uint` is a 32-bit integer by definition this is the correct way to do it. 2. Pre-divide the frame_sum by the num_wg immediately at the end of a frame. This change was made to prevent overflow. At 4K screen size, this code is currently already very at risk of overflow, especially once I started playing with longer averaging sizes. Pre-dividing this out makes it just about fit into 32-bit even for worst-case PQ content. (It's technically also faster and easier this way, so I should have done it to begin with). Rename `frame_sum` to `frame_avg` to clearly signal the change in semantics. 3. Implement a scene transition detection algorithm. This basically compares the current frame's average brightness against the (averaged) value of the past frames. If it exceeds a threshold, which I experimentally configured, we reset the peak detection SSBO's state immediately - so that it just contains the current frame. This prevents annoying "eye adaptation"-like effects on scene transitions. 4. As a result of the previous change, we can now use a much larger buffer size by default, which results in a more stable and less flickery result. I experimented with values between 20 and 256 and settled on the new value of 64. (I also switched to a power-of-2 array size, because I like powers of two)
* vo_gpu: port HDR tone mapping algorithm from libplaceboNiklas Haas2018-02-053-70/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current peak detection algorithm was very bugged (which contributed to the excessive cross-frame flicker without long normalization) and also didn't take into account the frame average brightness level. The new algorithm both takes into account frame average brightness (in addition to peak brightness), and also computes the values in a more stable/correct way. (The old path was basically undefined behavior) In addition to improving the algorithm, we also switch to hable tone mapping by default, and try to enable peak computation automatically whever possible (compute shaders + SSBOs supported). We also make the desaturation milder, after extensive testing during libplacebo development. I also had to compensate a bit for the representational differences between mpv and libplacebo (libplacebo treats 1.0 as the reference peak, but mpv treats it as the nominal peak), but it shouldn't have caused any problems. This is still not quite the same as libplacebo, since libplacebo also allows tagging the desired scene average brightness on the output, and it also supports reading the scene average brightness from static metadata (MaxFALL) where available. But those changes are a bit more involved. It's possible we could also read this from metadata in the future, but we have problems communicating with AVFrames as it is and I don't want to touch the mpv colorimetry structs for the time being.
* vo_gpu: add RA_CAP for gl_NumWorkGroupsNiklas Haas2018-02-051-0/+1
| | | | | SPIRV-Cross doesn't support this for the time being. It's possible this could go away again at a later date.
* vo_gpu: check for RA_CAP_FRAGCOORD in dumb mode tooJames Ross-Gowan2018-01-301-13/+14
| | | | | | | The RA_CAP_FRAGCOORD checks apply to dumb mode as well, but they were after the check for dumb mode, which returns early, so they never ran. Fixes #5436
* video: fix crash with vdpau when reinitializing renderingwm42018-01-271-3/+3
| | | | | | | | | | Using vdpau will allocate additional textures for the reinterleaving step, which uninit_rendering() will free. This is a problem because the hwdec image remains mapped when reinitializing, so the reinterleaving textures are turned into dangling pointers. Fix this by freeing the reinterleave textures on full uninit instead. Fixes #5447.
* hwdec: detach d3d and d3d9 hwaccel from anglemyfreeer2018-01-251-1/+3
| | | | Fix https://github.com/mpv-player/mpv/issues/5420
* video: change some remaining vo_opengl mentions to vo_gpuAkemi2018-01-201-1/+1
|
*