summaryrefslogtreecommitdiffstats
path: root/video/out/opengl/video_shaders.c
Commit message (Collapse)AuthorAgeFilesLines
* vo_opengl: refactor into vo_gpuNiklas Haas2017-09-211-872/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is done in several steps: 1. refactor MPGLContext -> struct ra_ctx 2. move GL-specific stuff in vo_opengl into opengl/context.c 3. generalize context creation to support other APIs, and add --gpu-api 4. rename all of the --opengl- options that are no longer opengl-specific 5. move all of the stuff from opengl/* that isn't GL-specific into gpu/ (note: opengl/gl_utils.h became opengl/utils.h) 6. rename vo_opengl to vo_gpu 7. to handle window screenshots, the short-term approach was to just add it to ra_swchain_fns. Long term (and for vulkan) this has to be moved to ra itself (and vo_gpu altered to compensate), but this was a stop-gap measure to prevent this commit from getting too big 8. move ra->fns->flush to ra_gl_ctx instead 9. some other minor changes that I've probably already forgotten Note: This is one half of a major refactor, the other half of which is provided by rossy's following commit. This commit enables support for all linux platforms, while his version enables support for all non-linux platforms. Note 2: vo_opengl_cb.c also re-uses ra_gl_ctx so it benefits from the --opengl- options like --opengl-early-flush, --opengl-finish etc. Should be a strict superset of the old functionality. Disclaimer: Since I have no way of compiling mpv on all platforms, some of these ports were done blindly. Specifically, the blind ports included context_mali_fbdev.c and context_rpi.c. Since they're both based on egl_helpers, the port should have gone smoothly without any major changes required. But if somebody complains about a compile error on those platforms (assuming anybody actually uses them), you know where to complain.
* vo_opengl: add a gamut warning featureNiklas Haas2017-09-101-2/+8
| | | | | | | | This clearly highlights all out-of-gamut/clipped pixels. (Either too bright or too saturated) Has some (documented) caveats. Also make TONE_MAPPING_CLIP stop actually clamping the value range (it's unnecessary and breaks this feature).
* vo_opengl: add support for vulkan GLSL dialectNiklas Haas2017-09-041-1/+1
| | | | | | | | | | Redefining texture1D / texture3D seems to be illegal, they are already built-in macros or something. So just use tex1D and tex3D instead. Additionally, GL_KHR_vulkan_glsl requires using explicit vertex locations and bindings, so make some changes to facilitate this. (It also requires explicitly setting location=0 for the color attachment output)
* vo_opengl: refactor scaler LUT weight packing/loadingNiklas Haas2017-09-041-19/+8
| | | | | | | | This is mostly done so we can support using textures with more components than the scaler LUTs have entries. But while we're at it, also change the way the weights are packed so that they're always sequential with no gaps. This allows us to simplify pass_sample_separated_get_weights as well.
* vo_opengl: scale deband-grain to the signal rangeNiklas Haas2017-09-031-2/+5
| | | | | This prevents blowing up for high dynamic range sources, where a noise level of 48 can suddenly mean 4800.
* vo_opengl: don't hardcode texmap0 for polar computeNiklas Haas2017-08-031-1/+3
| | | | | This was an oversight. The ID shouldn't be hard-coded here, so add it to sampler_prelude instead.
* vo_opengl: fix constexprs on ANGLENiklas Haas2017-08-031-6/+6
| | | | I hate GLES
* vo_opengl: fix HLG OOTF inverseNiklas Haas2017-08-031-1/+1
| | | | Got the "sign" of the second multiplication wrong.
* vo_opengl: generalize HDR tone mapping to gamut mappingNiklas Haas2017-08-031-12/+17
| | | | | | | | | | | | | | | | | | | | | | Since this code was already written for HDR, and is now per-channel (because it works better for HDR as well), we can actually reuse this to get very high quality gamut mapping without clipping. The only required change is to move the tone mapping from before the gamut map to after the gamut map. Additonally, we need to also account for changes in the signal range as a result of applying the CMS when we compute ref_peak, which is fortunately pretty easy because we only need to consider the case of primaries mapping to themselves. Since `HDR` no longer really makes sense as a label, rename it to `--tone-mapping` in general. Also fits better with `--tone-mapping-desat` etc. Arguably we could also rename `--hdr-compute-peak`, but that option is basically only useful for HDR content anyway because we don't need information about the signal range for gamut mapping. This (finally!) gives us reasonably high quality gamut mapping even in the absence of an ICC profile / 3DLUT.
* vo_opengl: implement HLG OOTF inverseNiklas Haas2017-08-031-8/+3
| | | | | | | Huge thanks to @rusxg for finding this solution, which was previously believed not to exist. Of course, we still don't actually need it, but I don't want to leave this half-implemented in case somebody does in the future.
* vo_opengl: manage scaler LUT textures via rawm42017-07-291-4/+4
| | | | Also fix the RA_CAP_ bitmask nonsense.
* vo_opengl: tone map on the maximum signal componentNiklas Haas2017-07-271-19/+25
| | | | | | | | | | This actually makes sure we don't decolor due to clipping even when the signal itself exceeds the luma by a significant factor, which was pretty common for saturated blues (and to a lesser degree, reds) - most noticeable in skies etc. This prevents the turn-the-sky-cyan effect of mobius tone mapping, and should also improve the other tone mapping modes in quality.
* vo_opengl: check against shmem limitsNiklas Haas2017-07-261-8/+3
| | | | | | The radius check was not strict enough, especially not for all platforms. To fix this, actually check the hardware capabilities instead of relying on a hard-coded maximum radius.
* vo_opengl: enable compute shader for mesaBin Jin2017-07-251-0/+1
| | | | | | | | | Mesa 17.1 supports compute shader but not full specs of OpenGL 4.3. Change the code to detect OpenGL extension "GL_ARB_compute_shader" rather than OpenGL version 4.3. HDR peak detection requires SSBO, and polar scaler requires 2D array extension. Add these extensions as requirement as well.
* vo_opengl: implement compute shader based EWA kernelNiklas Haas2017-07-241-6/+63
| | | | | | | | | | | This performs almost 50% faster on my machine (!!), from 4650μs down to about 3176μs for ewa_lanczossharp. It's possible we could use a similar approach to speed up the separable scalers, although with vastly simpler code. For separable scalers we'd also have the additional huge benefit of only needing padding in one direction, so we could potentially use a big 256x1 kernel or something to essentially compute an entire row at once.
* vo_opengl: support HDR peak detectionNiklas Haas2017-07-241-20/+58
| | | | | | | | | | | | | | This is done via compute shaders. As a consequence, the tone mapping algorithms had to be rewritten to compute their known constants in GLSL (ahead of time), instead of doing it once. Didn't affect performance. Using shmem/SSBO atomics in this way is extremely fast on nvidia, but it might be slow on other platforms. Needs testing. Unfortunately, setting up the SSBO still requires OpenGL calls, which means I can't have it in video_shaders.c, where it belongs. But I'll defer worrying about that until the backend refactor, since then I'll be breaking up the video/video_shaders structure anyway.
* vo_opengl: check format on some printf-like callswm42017-07-241-1/+1
| | | | Fix 1 incorrect use.
* vo_opengl: avoid constant divisionsNiklas Haas2017-07-171-31/+32
| | | | | | | These are apparently expensive on some drivers which are not smart enough to turn x/42 into x*1.0/42. So, do it for them. My great test framework says it's okay
* vo_opengl: support tone-mapping-param for `clip`Niklas Haas2017-07-071-1/+1
| | | | | | | This just indicates a fixed linear coefficient to multiply into the signal, similar to the old option --target-brightness (but the inverse thereof). Good for testing purposes, which is why I added it. (This also corresponds somewhat to what zimg does)
* vo_opengl: correct off-by-one in scale=oversampleNiklas Haas2017-07-071-1/+1
| | | | This caused a single pixel shift to the top-left, introduced in e3e03d0f3.
* vo_opengl: prevent desat from blowing up for negativeNiklas Haas2017-07-071-1/+1
| | | | | | | The current algorithm blew up when the color was negative, such as the case when downscaling with dscale=mitchell or other algorithms that introduce negative ringing. The simplest solution is to just slightly change the calculation to force both parameters to be in-range.
* vo_opengl: prevent division by zero in shaderNiklas Haas2017-07-061-1/+1
| | | | | In theory the max() should clamp it away anyway but I believe division by zero is UB so just avoid it altogether.
* vo_opengl: add --tone-mapping-desaturateNiklas Haas2017-07-061-4/+12
| | | | | | | This helps prevent unnaturally, weirdly colorized blown out highlights for direct images of the sunlit sky and other way-too-bright HDR content. I was debating whether to set the default at 1.0 or 2.0, but went with the more conservative option that preserves more detail/color.
* vo_opengl: use textureGatherOffset for polar filtersNiklas Haas2017-07-051-40/+84
| | | | | | | | | | | | | | | | | | This is more efficient on my machine (nvidia), but only when applied to groups of exactly 4 texels. So we switch to the more efficient textureGather for groups of 4. Some notes: - textureGatherOffset seems to be faster than textureGather by a non-negligible amount, but for some reason, textureOffset is still slower than a straight-up texture - textureGather* requires GLSL 400; and at least on nvidia, this requires actually allocating a GL 4.0 context. - the code in opengl/common.c that clamped the GLSL version to 330 is deprecated, because the old user shader style has been removed completely in the meantime - To combat the growing complexity of the polar sampling code, we drop the antiringing functionality from EWA shaders completely, since it never really worked well for EWA to begin with. (Horrific artifacting)
* filter_kernels: add radius cutoff functionalityNiklas Haas2017-07-031-5/+8
| | | | | | | | This allows filter functions to be prematurely cut off once their contributions start becoming insignificant. This effectively prevents wasted GPU time sampling from parts of the function that are essentially reduced to zero by the window function, providing anywhere from a 10% to 20% speedup. (5700μs -> 4700μs for me)
* vo_opengl: refactor vo performance subsystemNiklas Haas2017-07-011-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces `vo-performance` by `vo-passes`, bringing with it a number of changes and improvements: 1. mpv users can now introspect the vo_opengl passes, which is something that has been requested multiple times. 2. performance data is now measured per-pass, which helps both development and debugging. 3. since adding more passes is cheap, we can now report information for more passes (e.g. the blit pass, and the osd pass). Note: we also switch to nanosecond scale, to be able to measure these passes better. 4. `--user-shaders` authors can now describe their own passes, helping users both identify which user shaders are active at any given time as well as helping shader authors identify performance issues. 5. the timing data per pass is now exported as a full list of samples, so projects like Argon-/mpv-stats can immediately read out all of the samples and render a graph without having to manually poll this option constantly. Due to gl_timer's design being complicated (directly reading performance data would block, so we delay the actual read-back until the next _start command), it's vital not to conflate different passes that might be doing different things from one frame to another. To accomplish this, the actual timers are stored as part of the gl_shader_cache's sc_entry, which makes them unique for that exact shader. Starting and stopping the time measurement is easy to unify with the gl_sc architecture, because the existing API already relies on a "generate, render, reset" flow, so we can just put timer_start and timer_stop in sc_generate and sc_reset, respectively. The ugliest thing about this code is that due to the need to keep pass information relatively stable in between frames, we need to distinguish between "new" and "redrawn" frames, which bloats the code somewhat and also feels hacky and vo_opengl-specific. (But then again, this entire thing is vo_opengl-specific)
* vo_opengl: tone map using only luminance informationNiklas Haas2017-06-271-33/+24
| | | | | | | | | | | | | | | | | | | This is even better at preventing discoloration than tone mapping on the XYZ image. Partly inspired by the HLG OOTF. Also simplifies the way we tone map, and moves this logic to the pass_tone_map function where it belongs. This also fixes what could arguably be considered a bug in the HLG implementation when using HLG for non-BT.2020 colorspaces, which is not permitted by spec but thinkable in theory. Although in this case, I guess it will be arbitrary whether people use the BT.2020-normalized luma coefficients or change it to fit the colorspace, so I guess either way could be considered "right", depending on what people end up doing. Either way, in lieue of standard practice, we do what makes the most sense (to me), and hopefully others will follow. The downside is that we upload an extra vec3 uniform even if we don't use it, but eliminating that would be ugly.
* vo_opengl: implement sony s-log2 trcNiklas Haas2017-06-181-1/+18
| | | | | | | | Apparently this is virtually identical to Panasonic's V-Log, but using the constants from S-Log1 and an extra scaling coefficient to make the S-Log1 curve less limited. Whatever floats their NIH boat, I guess. Source: https://pro.sony.com/bbsccms/assets/files/micro/dmpc/training/S-Log2_Technical_PaperV1_0.pdf
* vo_opengl: implement sony s-log1 trcNiklas Haas2017-06-181-0/+14
| | | | | | | | Source: https://pro.sony.com/bbsccms/assets/files/mkt/cinema/solutions/slog_manual.pdf Not 100% confident in the implementation since the values from the spec seem to be very subtly off (~1%), but it should be close enough for practical purposes.
* vo_opengl: tone map in linear XYZ instead of RGBNiklas Haas2017-06-181-1/+19
| | | | | | | | | | | | | This preserves channel balance better and helps reduce discoloration due to nonlinear tone mapping. I wasn't sure whether to stuff this inside pass_color_manage or pass_tone_map but decided for the former because adding the extra mp_csp_prim would have made the signature of the latter longer than 80col, and also because the `mp_get_cms_matrix` below it basically does the same thing anyway, so it doesn't look that out of place. Also why is this justification longer than the actual description of the algorithm and what it's good for?
* vo_opengl: implement support for OOTFs and non-display referred contentNiklas Haas2017-06-181-8/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces (yet another..) mp_colorspace members, an enum `light` (for lack of a better name) which basically tells us whether we're dealing with scene-referred or display-referred light, but also a bit more metadata (in which way is the scene-referred light expected to be mapped to the display?). The addition of this parameter accomplishes two goals: 1. Allows us to actually support HLG more-or-less correctly[1] 2. Allows people playing back direct “camera” content (e.g. v-log or s-log2) to treat it as scene-referred instead of display-referred [1] Even better would be to use the display-referred OOTF instead of the idealized OOTF, but this would require either native HLG support in LittleCMS (unlikely) or more communication between lcms.c and video_shaders.c than I'm remotely comfortable with That being said, in principle we could switch our usage of the BT.1886 EOTF to the BT.709 OETF instead and treat BT.709 content as being scene-referred under application of the 709+1886 OOTF; which moves that particular conversion from the 3dlut to the shader code; but also allows a) users like UliZappe to turn it off and b) supporting the full HLG OOTF in the same framework. But I think I prefer things as they are right now.
* csputils: rename HDR curvesNiklas Haas2017-06-181-21/+21
| | | | | | | | | | st2084 and std-b67 are really weird names for PQ and HLG, which is what everybody else (including e.g. the ITU-R) calls them. Follow their example. I decided against naming them bt2020-pq and bt2020-hlg because it's not necessary in this case. The standard name is only used for the other colorspaces etc. because those literally have no other names.
* video: refactor HDR implementationNiklas Haas2017-06-181-37/+47
| | | | | | | | | | | | | | | List of changes: 1. Kill nom_peak, since it's a pointless non-field that stores nothing of value and is _always_ derived from ref_white anyway. 2. Kill ref_white/--target-brightness, because the only case it really existed for (PQ) actually doesn't need to be this general: According to ITU-R BT.2100, PQ *always* assumes a reference monitor with a white point of 100 cd/m². 3. Improve documentation and comments surrounding this stuff. 4. Clean up some of the code in general. Move stuff where it belongs.
* vo_opengl: add new HDR tone mapping algorithmNiklas Haas2017-06-091-0/+15
| | | | | | | | | | | | | | | | | I call it `mobius` because apparently the form f(x) = (cx+a)/(dx+b) is called a Möbius transform, which is the algorithm this is based on. In the extremes it becomes `reinhard` (param=0.0 and `clip` (param=1.0), smoothly transitioning between the two depending on the parameter. This is a useful tone mapping algorithm since the tunable mobius transform allows the user to decide the trade-off between color accuracy and detail preservation on a continuous scale. The default of 0.3 is already far more accurate than `reinhard` while also being reasonably good at preserving highlights, without suffering from the overall brightness drop and color distortion of `hable`. For these reasons, make this the new default. Also expand and improve the documentation for these tone mapping functions.
* filter_kernels: Keep f.radius in terms of dest/filter coords.Nicholas J. Kain2017-03-061-2/+2
| | | | | | | | | | | | | The existing code modifies f.radius so that it is in terms of the filter sample radius (in the source coordinate space) and has some small errors because of this behavior. This commit changes f.radius so that it is always in terms of the filter function radius (in the destination coordinate space). The sample radius can always be derived by multiplying f.radius by filter_scale, which is the new, more descriptive name for the previous inv_scale.
* vo_opengl: dynamically manage texture unitswm42016-09-141-4/+2
| | | | | | | | | | | | | | | | | | | | | A minor cleanup that makes the code simpler, and guarantees that we cleanup the GL state properly at any point. We do this by reusing the uniform caching, and assigning each sampler uniform its own texture unit by incrementing a counter. This has various subtle consequences for the GL driver, which hopefully don't matter. For example, it will bind fewer textures at a time, but also rebind them more often. For some reason we keep TEXUNIT_VIDEO_NUM, because it limits the number of hook passes that can be bound at the same time. OSD rendering is an exception: we do many passes with the same shader, and rebinding the texture each pass. For now, this is handled in an unclean way, and we make the shader cache reserve texture unit 0 for the OSD texture. At a later point, we should allocate that one dynamically too, and just pass the texture unit to the OSD rendering code. Right now I feel like vo_rpi.c (may it rot in hell) is in the way.
* csp: document deviations from the references where they occurNiklas Haas2016-07-051-2/+22
| | | | | | | | | | These mostly happen in situations where the correct behavior is relatively new and not found in the wild (therefore not worth implementing) and/or extremely complicated (and thus not worth worrying about the potential edge cases and UI changes). Still, it's best to document these where they happen to guide the poor souls maintaining these files in the future.
* vo_opengl: generalize HDR tone mapping mechanismNiklas Haas2016-07-031-6/+51
| | | | | | | | | | | | | | | | | | | | | | | | This involves multiple changes: 1. Brightness metadata is split into nominal peak and signal peak. For a quick and dirty explanation: nominal peak is the brightest value that your color space can represent (i.e. the brightness of an encoded 1.0), and signal peak is the brightest value that actually occurs in the video (i.e. the brightest thing that's displayed). 2. vo_opengl uses a new decision logic to figure out the right nom_peak and sig_peak for all situations. It also does a better job of picking the right target gamut/colorspace to use for the OSD. (Which still is and still should be treated as sRGB). This change in logic also fixes #3293 en passant. 3. Since it was growing rapidly, the logic for auto-guessing / inferring the right colorimetry configuration (in pass_colormanage) was split from the logic for actually performing the adaptation (now pass_color_map). Right now, the new logic doesn't do a whole lot since HDR metadata is still ignored (but not for long).
* vo_opengl: implement the Panasonic V-Log functionNiklas Haas2016-06-281-0/+24
| | | | | | | | | | User request and not that hard. Closes #3157. Note that FFmpeg doesn't support this and there's no signalling in HEVC etc., so the only way users can access it is by using vf_format manually. Mind: This encoding uses full range values, not TV range.
* vo_opengl: implement ARIB STD-B68 (HLG) HDR TRCNiklas Haas2016-06-281-0/+23
| | | | | | | | | | | | | | This HDR function is unique in that it's still display-referred, it just allows for values above the reference peak (super-highlights). The official standard doesn't actually document this very well, but the nominal peak turns out to be exactly 12.0 - so we normalize to this value internally in mpv. (This lets us preserve the property that the textures are encoded in the range [0,1], preventing clipping and making the best use of an integer texture's range) This was grouped together with SMPTE ST2084 when checking libavutil compatibility since they were added in the same release window, in a similar timeframe.
* vo_opengl: refactor HDR mechanismNiklas Haas2016-05-301-11/+6
| | | | | | | | | | | | | | | | | | | | Instead of doing HDR tone mapping on an ad-hoc basis inside pass_colormanage, the reference peak of an image is now part of the image params (alongside colorspace, gamma, etc.) and tone mapping is done whenever peak_src != peak_dst. To get sensible behavior when mixing HDR and SDR content and displays, target-brightness is a generic filler for "the assumed brightness of SDR content". This gets rid of the weird display_scaled hack, sets the framework for multiple HDR functions with difference reference peaks, and allows us to (in a future commit) autodetect the right source peak from the HDR metadata. (Apart from metadata, the source peak can also be controlled via vf_format. For HDR content this adjusts the overall image brightness, for SDR content it's like simulating a different exposure)
* vo_opengl: add hable tone-mapping algorithmNiklas Haas2016-05-301-0/+11
| | | | | | | | Developed by John Hable for use in Uncharted 2. Also used by Frictional Games in SOMA. Originally inspired by a filmic tone mapping algorithm created by Kodak. From http://frictionalgames.blogspot.de/2012/09/tech-feature-hdr-lightning.html
* vo_opengl: rename tone-mapping=simple to reinhardNiklas Haas2016-05-301-1/+1
| | | | | This is the canonical name for the algorithm. I simply didn't know it before.
* vo_opengl: fix bicubic_fast in ES modewm42016-05-161-1/+1
| | | | | | GLES shaders disallow implicit conversion from int to float. This has been broken for quite a while.
* vo_opengl: implement more HDR tonemapping algorithmsNiklas Haas2016-05-161-0/+40
| | | | | | | | | | | | | | | | | | | | | This is now a configurable option, with tunable parameters. I got inspiration for these algorithms off wikipedia. "simple" seems to work pretty well, but not well enough to make it a reasonable default. Some other notable candidates: - Local functions (e.g. based on local c