summaryrefslogtreecommitdiffstats
path: root/video/out/gpu/video.h
Commit message (Collapse)AuthorAgeFilesLines
* vo_gpu: implement error diffusion for ditheringBin Jin2019-06-161-0/+2
| | | | | | | | | | | | | | | | | This is a straightforward parallel implementation of error diffusion algorithms in compute shader. Basically we use single work group with maximal possible size to process the whole image. After a shift mapping we are able to process all pixels column by column. A large ring buffer are allocated in shared memory to speed things up. However the size of required shared memory depends linearly on the height of video window (or screen height in fullscreen mode). In case there is no enough shared memory, it will fallback to `--dither=fruit`. The maximal allowed work group size is hardcoded as 1024. Ideally we could query `GL_MAX_COMPUTE_WORK_GROUP_INVOCATIONS`. But for whatever reason, it seems most high end card from nvidia and amd support only the minimal required value, so I guess we can stick to it for now.
* vo_gpu: use dB units for scene change detectionNiklas Haas2019-02-181-2/+2
| | | | | | | Rather than the linear cd/m^2 units, these (relative) logarithmic units lend themselves much better to actually detecting scene changes, especially since the scene averaging was changed to also work logarithmically.
* vo_gpu: allow boosting dark scenes when tone mappingNiklas Haas2019-02-181-0/+1
| | | | | | | | | | | | In theory our "eye adaptation" algorithm works in both ways, both darkening bright scenes and brightening dark scenes. But I've always just prevented the latter with a hard clamp, since I wanted to avoid blowing up dark scenes into looking funny (and full of noise). But allowing a tiny bit of over-exposure might be a good thing. I won't change the default just yet (better let users test), but a moderate value of 1.2 might be better than the current 1.0 limit. Needs testing especially on dark scenes.
* vo_gpu: redesign peak detection algorithmNiklas Haas2019-02-181-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | The previous approach of using an FIR with tunable hard threshold for scene changes had several problems: - the FIR involved annoying hard-coded buffer sizes, high VRAM usage, and the FIR sum was prone to numerical overflow which limited the number of frames we could average over. We also totally redesign the scene change detection. - the hard scene change detection was prone to both false positives and false negatives, each with their own (annoying) issues. Scrap this entirely and switch to a dual approach of using a simple single-pole IIR low pass filter to smooth out noise, while using a softer scene change curve (with tunable low and high thresholds), based on `smoothstep`. The IIR filter is extremely simple in its implementation and has an arbitrarily user-tunable cutoff frequency, while the smoothstep-based scene change curve provides a good, tunable tradeoff between adaptation speed and stability - without exhibiting either of the traditional issues associated with the hard cutoff. Another way to think about the new options is that the "low threshold" provides a margin of error within which we don't care about small fluctuations in the scene (which will therefore be smoothed out by the IIR filter).
* vo_gpu: improve tone mapping desaturationNiklas Haas2019-02-181-5/+10
| | | | | | | | | | | | | Instead of desaturating towards luma, we desaturate towards the per-channel tone mapped version. This essentially proves a smooth roll-off towards the "hollywood"-style (non-chromatic) tone mapping algorithm, which works better for bright content, while continuing to use the "linear" style (chromatic) tone mapping algorithm for primarily in-gamut content. We also split up the desaturation algorithm into strength and exponent, which allows users to use less aggressive desaturation settings without affecting the overall curve.
* vo_gpu: split --linear-scaling into two separate optionsNiklas Haas2018-10-191-1/+2
| | | | | | | | | | | | | | | | | | Since linear downscaling makes sense to handle independently from linear/sigmoid upscaling, we split this option up. Now, linear-downscaling is its own option that only controls linearization when downscaling and nothing more. Likewise, linear-upscaling / sigmoid-upscaling are two mutually exclusive options (the latter overriding the former) that apply only to upscaling and no longer implicitly enable linear light downscaling as well. The old behavior was very confusing, as evidenced by issues such as #6213. The current behavior should make much more sense, and only minimally breaks backwards compatibility (since using linear-scaling directly was very uncommon - most users got this for free as part of gpu-hq and relied only on that). Closes #6213.
* vo_gpu: remove a dead declarationwm42018-02-281-1/+0
|
* vo_gpu: introduce --target-peakNiklas Haas2018-02-201-1/+1
| | | | | | | | | | | | | This solves a number of problems simultaneously: 1. When outputting HLG, this allows tuning the OOTF based on the display characteristics. 2. When outputting PQ or other HDR curves, this allows soft-limiting the output brightness using the tone mapping algorithm. 3. When outputting SDR, this allows HDR-in-SDR style output, by controlling the output brightness directly. Closes #5521
* vo_gpu: make screenshots use the GL rendererwm42018-02-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Using the GL renderer for color conversion will make sure screenshots will use the same conversion as normal video rendering. It can do this for all types of screenshots. The logic when to write 16 bit PNGs changes. To approximate the old behavior, we decide by looking whether the source video format has more than 8 bits per component. We apply this logic even for window screenshots. Also, 16 bit PNGs now always include an unused alpha channel. The reason is that FFmpeg has RGB48 and RGBA64 formats, but no RGB064. RGB48 is 3 bytes and usually not supported by GPUs for rendering, so we have to use RGBA64, which forces an alpha channel. Will break for users who use --target-trc and similar options. I considered creating a new gl_video context, but it could double GPU memory use, so I didn't. This uses FBOs instead of glGetTexImage(), because that increases the chance it could work on GLES (e.g. ANGLE). Untested. No support for the Vulkan and D3D11 backends yet. Fixes #5498. Also fixes #5240, because the code for reading back is not used with the new code path.
* vo_gpu: add internal ability to skip osd/subs for renderingwm42018-02-111-1/+7
| | | | Needed for the following commit.
* vo_gpu: refactor HDR peak detection algorithmNiklas Haas2018-02-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The major changes are as follows: 1. Use `uint32_t` instead of `unsigned int` for the SSBO size calculation. This doesn't really matter, since a too-big buffer will still work just fine, but since `uint` is a 32-bit integer by definition this is the correct way to do it. 2. Pre-divide the frame_sum by the num_wg immediately at the end of a frame. This change was made to prevent overflow. At 4K screen size, this code is currently already very at risk of overflow, especially once I started playing with longer averaging sizes. Pre-dividing this out makes it just about fit into 32-bit even for worst-case PQ content. (It's technically also faster and easier this way, so I should have done it to begin with). Rename `frame_sum` to `frame_avg` to clearly signal the change in semantics. 3. Implement a scene transition detection algorithm. This basically compares the current frame's average brightness against the (averaged) value of the past frames. If it exceeds a threshold, which I experimentally configured, we reset the peak detection SSBO's state immediately - so that it just contains the current frame. This prevents annoying "eye adaptation"-like effects on scene transitions. 4. As a result of the previous change, we can now use a much larger buffer size by default, which results in a more stable and less flickery result. I experimented with values between 20 and 256 and settled on the new value of 64. (I also switched to a power-of-2 array size, because I like powers of two)
* vo_gpu: port HDR tone mapping algorithm from libplaceboNiklas Haas2018-02-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current peak detection algorithm was very bugged (which contributed to the excessive cross-frame flicker without long normalization) and also didn't take into account the frame average brightness level. The new algorithm both takes into account frame average brightness (in addition to peak brightness), and also computes the values in a more stable/correct way. (The old path was basically undefined behavior) In addition to improving the algorithm, we also switch to hable tone mapping by default, and try to enable peak computation automatically whever possible (compute shaders + SSBOs supported). We also make the desaturation milder, after extensive testing during libplacebo development. I also had to compensate a bit for the representational differences between mpv and libplacebo (libplacebo treats 1.0 as the reference peak, but mpv treats it as the nominal peak), but it shouldn't have caused any problems. This is still not quite the same as libplacebo, since libplacebo also allows tagging the desired scene average brightness on the output, and it also supports reading the scene average brightness from static metadata (MaxFALL) where available. But those changes are a bit more involved. It's possible we could also read this from metadata in the future, but we have problems communicating with AVFrames as it is and I don't want to touch the mpv colorimetry structs for the time being.
* vo_gpu: make it possible to load multiple hwdec interop driverswm42017-12-011-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the VO<->decoder interface capable of supporting multiple hwdec APIs at once. The main gain is that this simplifies autoprobing a lot. Before this change, it could happen that the VO loaded the "wrong" hwdec API, and the decoder was stuck with the choice (breaking hw decoding). With the change applied, the VO simply loads all available APIs, so autoprobing trickery is left entirely to the decoder. In the past, we were quite careful about not accidentally loading the wrong interop drivers. This was in part to make sure autoprobing works, but also because libva had this obnoxious bug of dumping garbage to stderr when using the API. libva was fixed, so this is not a problem anymore. The --opengl-hwdec-interop option is changed in various ways (again...), and renamed to --gpu-hwdec-interop. It does not have much use anymore, other than debugging. It's notable that the order in the hwdec interop array ra_hwdec_drivers[] still matters if multiple drivers support the same image formats, so the option can explicitly force one, if that should ever be necessary, or more likely, for debugging. One example are the ra_hwdec_d3d11egl and ra_hwdec_d3d11eglrgb drivers, which both support d3d11 input. vo_gpu now always loads the interop lazily by default, but when it does, it loads them all. vo_opengl_cb now always loads them when the GL context handle is initialized. I don't expect that this causes any problems. It's now possible to do things like changing between vdpau and nvdec decoding at runtime. This is also preparation for cleaning up vd_lavc.c hwdec autoprobing. It's another reason why hwdec_devices_request_all() does not take a hwdec type anymore.
* vo_gpu: fix video sometimes not being rerendered on equalizer changewm42017-10-171-1/+0
| | | | | | | | | | | | | | With video paused, changing the brightness controls (or similar) would sometimes not rerender the video frame. So the OSD would redraw, but the video wouldn't change. This is caused by output caching, and a redraw request is free to return the cached frame. Change it such to invalidate the cached frame if any of the options or the equalizer change. In theory, gl_video_reset_surfaces() could be called if the equalizer changes - this would apparently force interpolatzion to redraw all frames. But this looks kind of crappy when changing the equalizer during playback. It'll "eventually" use the correct settings anyway, and when paused interpolation is off.
* vo_gpu: make the vertex attribs dynamicNiklas Haas2017-09-281-4/+0
| | | | | | | | | | | | | | | | This has several advantages: 1. no more redundant texcoords when we don't need them 2. no more arbitrary limit on how many textures we can bind 3. (that extends to user shaders as well) 4. no more arbitrary limits on tscale radius To realize this, the VAO was moved from a hacky stateful approach (gl_sc_set_vertex_attribs) - which always bothered me since it was required for compute shaders as well even though they ignored it - to be a proper parameter of gl_sc_dispatch_draw, and internally plumbed into gl_sc_generate, which will make a (properly mangled) deep copy into params.vertex_attribs.
* vo_gpu: simplify structs / namesNiklas Haas2017-09-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to the plethora of historical baggage from different eras getting confusing, I decided to simplify and unify the struct organization and naming scheme. Structs that got renamed: 1. fbodst -> ra_fbo (and moved to gpu/context.h) 2. fbotex -> removed (redundant after 2af2fa7a) 3. fbosurface -> surface 4. img_tex -> image In addition to these structs being renamed, all of the names have been made consistent. The new scheme is as follows: struct image img; struct ra_tex *tex; struct ra_fbo fbo; This also affects derived names, e.g. indirect_fbo -> indirect_tex. Notably also, finish_pass_fbo -> finish_pass_tex and finish_pass_direct -> finish_pass_fbo. The new equivalent of fbotex_change() is called ra_tex_resize(). This commit (should) contain no logic changes, just renaming a bunch of crap.
* vo_opengl: refactor into vo_gpuNiklas Haas2017-09-211-0/+194
This is done in several steps: 1. refactor MPGLContext -> struct ra_ctx 2. move GL-specific stuff in vo_opengl into opengl/context.c 3. generalize context creation to support other APIs, and add --gpu-api 4. rename all of the --opengl- options that are no longer opengl-specific 5. move all of the stuff from opengl/* that isn't GL-specific into gpu/ (note: opengl/gl_utils.h became opengl/utils.h) 6. rename vo_opengl to vo_gpu 7. to handle window screenshots, the short-term approach was to just add it to ra_swchain_fns. Long term (and for vulkan) this has to be moved to ra itself (and vo_gpu altered to compensate), but this was a stop-gap measure to prevent this commit from getting too big 8. move ra->fns->flush to ra_gl_ctx instead 9. some other minor changes that I've probably already forgotten Note: This is one half of a major refactor, the other half of which is provided by rossy's following commit. This commit enables support for all linux platforms, while his version enables support for all non-linux platforms. Note 2: vo_opengl_cb.c also re-uses ra_gl_ctx so it benefits from the --opengl- options like --opengl-early-flush, --opengl-finish etc. Should be a strict superset of the old functionality. Disclaimer: Since I have no way of compiling mpv on all platforms, some of these ports were done blindly. Specifically, the blind ports included context_mali_fbdev.c and context_rpi.c. Since they're both based on egl_helpers, the port should have gone smoothly without any major changes required. But if somebody complains about a compile error on those platforms (assuming anybody actually uses them), you know where to complain.