summaryrefslogtreecommitdiffstats
path: root/video/out/opengl/video_shaders.c
Commit message (Collapse)AuthorAgeFilesLines
* vo_opengl: use textureGatherOffset for polar filtersNiklas Haas2017-07-051-40/+84
| | | | | | | | | | | | | | | | | | This is more efficient on my machine (nvidia), but only when applied to groups of exactly 4 texels. So we switch to the more efficient textureGather for groups of 4. Some notes: - textureGatherOffset seems to be faster than textureGather by a non-negligible amount, but for some reason, textureOffset is still slower than a straight-up texture - textureGather* requires GLSL 400; and at least on nvidia, this requires actually allocating a GL 4.0 context. - the code in opengl/common.c that clamped the GLSL version to 330 is deprecated, because the old user shader style has been removed completely in the meantime - To combat the growing complexity of the polar sampling code, we drop the antiringing functionality from EWA shaders completely, since it never really worked well for EWA to begin with. (Horrific artifacting)
* filter_kernels: add radius cutoff functionalityNiklas Haas2017-07-031-5/+8
| | | | | | | | This allows filter functions to be prematurely cut off once their contributions start becoming insignificant. This effectively prevents wasted GPU time sampling from parts of the function that are essentially reduced to zero by the window function, providing anywhere from a 10% to 20% speedup. (5700μs -> 4700μs for me)
* vo_opengl: refactor vo performance subsystemNiklas Haas2017-07-011-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces `vo-performance` by `vo-passes`, bringing with it a number of changes and improvements: 1. mpv users can now introspect the vo_opengl passes, which is something that has been requested multiple times. 2. performance data is now measured per-pass, which helps both development and debugging. 3. since adding more passes is cheap, we can now report information for more passes (e.g. the blit pass, and the osd pass). Note: we also switch to nanosecond scale, to be able to measure these passes better. 4. `--user-shaders` authors can now describe their own passes, helping users both identify which user shaders are active at any given time as well as helping shader authors identify performance issues. 5. the timing data per pass is now exported as a full list of samples, so projects like Argon-/mpv-stats can immediately read out all of the samples and render a graph without having to manually poll this option constantly. Due to gl_timer's design being complicated (directly reading performance data would block, so we delay the actual read-back until the next _start command), it's vital not to conflate different passes that might be doing different things from one frame to another. To accomplish this, the actual timers are stored as part of the gl_shader_cache's sc_entry, which makes them unique for that exact shader. Starting and stopping the time measurement is easy to unify with the gl_sc architecture, because the existing API already relies on a "generate, render, reset" flow, so we can just put timer_start and timer_stop in sc_generate and sc_reset, respectively. The ugliest thing about this code is that due to the need to keep pass information relatively stable in between frames, we need to distinguish between "new" and "redrawn" frames, which bloats the code somewhat and also feels hacky and vo_opengl-specific. (But then again, this entire thing is vo_opengl-specific)
* vo_opengl: tone map using only luminance informationNiklas Haas2017-06-271-33/+24
| | | | | | | | | | | | | | | | | | | This is even better at preventing discoloration than tone mapping on the XYZ image. Partly inspired by the HLG OOTF. Also simplifies the way we tone map, and moves this logic to the pass_tone_map function where it belongs. This also fixes what could arguably be considered a bug in the HLG implementation when using HLG for non-BT.2020 colorspaces, which is not permitted by spec but thinkable in theory. Although in this case, I guess it will be arbitrary whether people use the BT.2020-normalized luma coefficients or change it to fit the colorspace, so I guess either way could be considered "right", depending on what people end up doing. Either way, in lieue of standard practice, we do what makes the most sense (to me), and hopefully others will follow. The downside is that we upload an extra vec3 uniform even if we don't use it, but eliminating that would be ugly.
* vo_opengl: implement sony s-log2 trcNiklas Haas2017-06-181-1/+18
| | | | | | | | Apparently this is virtually identical to Panasonic's V-Log, but using the constants from S-Log1 and an extra scaling coefficient to make the S-Log1 curve less limited. Whatever floats their NIH boat, I guess. Source: https://pro.sony.com/bbsccms/assets/files/micro/dmpc/training/S-Log2_Technical_PaperV1_0.pdf
* vo_opengl: implement sony s-log1 trcNiklas Haas2017-06-181-0/+14
| | | | | | | | Source: https://pro.sony.com/bbsccms/assets/files/mkt/cinema/solutions/slog_manual.pdf Not 100% confident in the implementation since the values from the spec seem to be very subtly off (~1%), but it should be close enough for practical purposes.
* vo_opengl: tone map in linear XYZ instead of RGBNiklas Haas2017-06-181-1/+19
| | | | | | | | | | | | | This preserves channel balance better and helps reduce discoloration due to nonlinear tone mapping. I wasn't sure whether to stuff this inside pass_color_manage or pass_tone_map but decided for the former because adding the extra mp_csp_prim would have made the signature of the latter longer than 80col, and also because the `mp_get_cms_matrix` below it basically does the same thing anyway, so it doesn't look that out of place. Also why is this justification longer than the actual description of the algorithm and what it's good for?
* vo_opengl: implement support for OOTFs and non-display referred contentNiklas Haas2017-06-181-8/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces (yet another..) mp_colorspace members, an enum `light` (for lack of a better name) which basically tells us whether we're dealing with scene-referred or display-referred light, but also a bit more metadata (in which way is the scene-referred light expected to be mapped to the display?). The addition of this parameter accomplishes two goals: 1. Allows us to actually support HLG more-or-less correctly[1] 2. Allows people playing back direct “camera” content (e.g. v-log or s-log2) to treat it as scene-referred instead of display-referred [1] Even better would be to use the display-referred OOTF instead of the idealized OOTF, but this would require either native HLG support in LittleCMS (unlikely) or more communication between lcms.c and video_shaders.c than I'm remotely comfortable with That being said, in principle we could switch our usage of the BT.1886 EOTF to the BT.709 OETF instead and treat BT.709 content as being scene-referred under application of the 709+1886 OOTF; which moves that particular conversion from the 3dlut to the shader code; but also allows a) users like UliZappe to turn it off and b) supporting the full HLG OOTF in the same framework. But I think I prefer things as they are right now.
* csputils: rename HDR curvesNiklas Haas2017-06-181-21/+21
| | | | | | | | | | st2084 and std-b67 are really weird names for PQ and HLG, which is what everybody else (including e.g. the ITU-R) calls them. Follow their example. I decided against naming them bt2020-pq and bt2020-hlg because it's not necessary in this case. The standard name is only used for the other colorspaces etc. because those literally have no other names.
* video: refactor HDR implementationNiklas Haas2017-06-181-37/+47
| | | | | | | | | | | | | | | List of changes: 1. Kill nom_peak, since it's a pointless non-field that stores nothing of value and is _always_ derived from ref_white anyway. 2. Kill ref_white/--target-brightness, because the only case it really existed for (PQ) actually doesn't need to be this general: According to ITU-R BT.2100, PQ *always* assumes a reference monitor with a white point of 100 cd/m². 3. Improve documentation and comments surrounding this stuff. 4. Clean up some of the code in general. Move stuff where it belongs.
* vo_opengl: add new HDR tone mapping algorithmNiklas Haas2017-06-091-0/+15
| | | | | | | | | | | | | | | | | I call it `mobius` because apparently the form f(x) = (cx+a)/(dx+b) is called a Möbius transform, which is the algorithm this is based on. In the extremes it becomes `reinhard` (param=0.0 and `clip` (param=1.0), smoothly transitioning between the two depending on the parameter. This is a useful tone mapping algorithm since the tunable mobius transform allows the user to decide the trade-off between color accuracy and detail preservation on a continuous scale. The default of 0.3 is already far more accurate than `reinhard` while also being reasonably good at preserving highlights, without suffering from the overall brightness drop and color distortion of `hable`. For these reasons, make this the new default. Also expand and improve the documentation for these tone mapping functions.
* filter_kernels: Keep f.radius in terms of dest/filter coords.Nicholas J. Kain2017-03-061-2/+2
| | | | | | | | | | | | | The existing code modifies f.radius so that it is in terms of the filter sample radius (in the source coordinate space) and has some small errors because of this behavior. This commit changes f.radius so that it is always in terms of the filter function radius (in the destination coordinate space). The sample radius can always be derived by multiplying f.radius by filter_scale, which is the new, more descriptive name for the previous inv_scale.
* vo_opengl: dynamically manage texture unitswm42016-09-141-4/+2
| | | | | | | | | | | | | | | | | | | | | A minor cleanup that makes the code simpler, and guarantees that we cleanup the GL state properly at any point. We do this by reusing the uniform caching, and assigning each sampler uniform its own texture unit by incrementing a counter. This has various subtle consequences for the GL driver, which hopefully don't matter. For example, it will bind fewer textures at a time, but also rebind them more often. For some reason we keep TEXUNIT_VIDEO_NUM, because it limits the number of hook passes that can be bound at the same time. OSD rendering is an exception: we do many passes with the same shader, and rebinding the texture each pass. For now, this is handled in an unclean way, and we make the shader cache reserve texture unit 0 for the OSD texture. At a later point, we should allocate that one dynamically too, and just pass the texture unit to the OSD rendering code. Right now I feel like vo_rpi.c (may it rot in hell) is in the way.
* csp: document deviations from the references where they occurNiklas Haas2016-07-051-2/+22
| | | | | | | | | | These mostly happen in situations where the correct behavior is relatively new and not found in the wild (therefore not worth implementing) and/or extremely complicated (and thus not worth worrying about the potential edge cases and UI changes). Still, it's best to document these where they happen to guide the poor souls maintaining these files in the future.
* vo_opengl: generalize HDR tone mapping mechanismNiklas Haas2016-07-031-6/+51
| | | | | | | | | | | | | | | | | | | | | | | | This involves multiple changes: 1. Brightness metadata is split into nominal peak and signal peak. For a quick and dirty explanation: nominal peak is the brightest value that your color space can represent (i.e. the brightness of an encoded 1.0), and signal peak is the brightest value that actually occurs in the video (i.e. the brightest thing that's displayed). 2. vo_opengl uses a new decision logic to figure out the right nom_peak and sig_peak for all situations. It also does a better job of picking the right target gamut/colorspace to use for the OSD. (Which still is and still should be treated as sRGB). This change in logic also fixes #3293 en passant. 3. Since it was growing rapidly, the logic for auto-guessing / inferring the right colorimetry configuration (in pass_colormanage) was split from the logic for actually performing the adaptation (now pass_color_map). Right now, the new logic doesn't do a whole lot since HDR metadata is still ignored (but not for long).
* vo_opengl: implement the Panasonic V-Log functionNiklas Haas2016-06-281-0/+24
| | | | | | | | | | User request and not that hard. Closes #3157. Note that FFmpeg doesn't support this and there's no signalling in HEVC etc., so the only way users can access it is by using vf_format manually. Mind: This encoding uses full range values, not TV range.
* vo_opengl: implement ARIB STD-B68 (HLG) HDR TRCNiklas Haas2016-06-281-0/+23
| | | | | | | | | | | | | | This HDR function is unique in that it's still display-referred, it just allows for values above the reference peak (super-highlights). The official standard doesn't actually document this very well, but the nominal peak turns out to be exactly 12.0 - so we normalize to this value internally in mpv. (This lets us preserve the property that the textures are encoded in the range [0,1], preventing clipping and making the best use of an integer texture's range) This was grouped together with SMPTE ST2084 when checking libavutil compatibility since they were added in the same release window, in a similar timeframe.
* vo_opengl: refactor HDR mechanismNiklas Haas2016-05-301-11/+6
| | | | | | | | | | | | | | | | | | | | Instead of doing HDR tone mapping on an ad-hoc basis inside pass_colormanage, the reference peak of an image is now part of the image params (alongside colorspace, gamma, etc.) and tone mapping is done whenever peak_src != peak_dst. To get sensible behavior when mixing HDR and SDR content and displays, target-brightness is a generic filler for "the assumed brightness of SDR content". This gets rid of the weird display_scaled hack, sets the framework for multiple HDR functions with difference reference peaks, and allows us to (in a future commit) autodetect the right source peak from the HDR metadata. (Apart from metadata, the source peak can also be controlled via vf_format. For HDR content this adjusts the overall image brightness, for SDR content it's like simulating a different exposure)
* vo_opengl: add hable tone-mapping algorithmNiklas Haas2016-05-301-0/+11
| | | | | | | | Developed by John Hable for use in Uncharted 2. Also used by Frictional Games in SOMA. Originally inspired by a filmic tone mapping algorithm created by Kodak. From http://frictionalgames.blogspot.de/2012/09/tech-feature-hdr-lightning.html
* vo_opengl: rename tone-mapping=simple to reinhardNiklas Haas2016-05-301-1/+1
| | | | | This is the canonical name for the algorithm. I simply didn't know it before.
* vo_opengl: fix bicubic_fast in ES modewm42016-05-161-1/+1
| | | | | | GLES shaders disallow implicit conversion from int to float. This has been broken for quite a while.
* vo_opengl: implement more HDR tonemapping algorithmsNiklas Haas2016-05-161-0/+40
| | | | | | | | | | | | | | | | | | | | | This is now a configurable option, with tunable parameters. I got inspiration for these algorithms off wikipedia. "simple" seems to work pretty well, but not well enough to make it a reasonable default. Some other notable candidates: - Local functions (e.g. based on local contrast or gradient) - Clamp with soft knee (linear up to a point) - Mapping in CIE L*Ch. Map L smoothly, clamp C and h. - Color appearance models These will have to be implemented some other time. Note that the parameter "peak_src" to pass_tone_map should, in principle, be auto-detected from the SEI information of the source file where available. This will also have to be implemented in a later commit.
* vo_opengl: implement HDR (SMPTE ST2084)Niklas Haas2016-05-161-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, this relies on the user manually entering their display brightness (since we have no way to detect this at runtime or from ICC metadata). The default value of 250 was picked by looking at ~10 reviews on tftcentral.co.uk and realizing they all come with around 250 cd/m^2 out of the box. (In addition, ITU-R Rec. BT.2022 supports this) Since there is no metadata in FFmpeg to indicate usage of this TRC, the only way to actually play HDR content currently is to set ``--vf=format=gamma=st2084``. (It could be guessed based on SEI, but this is not implemented yet) Incidentally, since SEI is ignored, it's currently assumed that all content is scaled to 10,000 cd/m^2 (and hard-clipped where out of range). I don't see this assumption changing much, though. As an unfortunate consequence of the fact that we don't know the display brightness, mixed with the fact that LittleCMS' parametric tone curves are not flexible enough to support PQ, we have to build the 3DLUT against gamma 2.2 if it's used. This might be a good thing, though, consdering the PQ source space is probably not fantastic for interpolation either way. Partially addresses #2572.
* vo_opengl: abstract hook texture access behind macroNiklas Haas2016-05-151-30/+27
| | | | | | | | | | | | | | | | | | | | | | | | This macro takes care of rotation, swizzling, integer conversion and normalization automatically. I found the performance impact to be nonexistant for superxbr and debanding, although rotation *did* have an impact due to the extra matrix multiplication. (So it gets skipped where possible) All of the internal hooks have been rewritten to use this new mechanism, and the prescaler hooks have finally been separated from each other. This also means the prescale FBO kludge is no longer required. This fixes image corruption for image formats like 0bgr, and also fixes prescaling under rotation. (As well as other user hooks that have orientation-dependent access) The "raw" attributes (tex, tex_pos, pixel_size) are still un-rotated, in case something needs them, but ideally the hooks should be rewritten to use the new API as much as possible. The hooked texture has been renamed from just NAME to NAME_raw to make script authors notice the change (and also deemphasize direct texture access). This is also a step towards getting rid of the use_integer pass.
* vo_opengl: add optional hook pointsNiklas Haas2016-05-151-4/+1
| | | | | | | | | | | | | | These are "sequence points" where the image could be rendered out to an FBO, hooked, and re-loaded if any such hook exists. This is perfect for things like the current user shaders system, as well as optional effects like unsharp masking. Note that since we have to pick *some* FBO to store the optionally hooked texture, we just store it in an array indexed by an increasing counter. Since we only ever store as many as MAX_TEXTURE_HOOKS + all internal hook points entries, this is guaranteed to be enough space. This commit also removes some of the now unused FBOs.
* vo_opengl: remove some pointless compatibilitywm42016-05-141-3/+3
| | | | | | Remove non-texture_rg compatibility from LUT sampling. OpenGL without texture_rg support will always trigger dumb-mode, and dumb-mode does not use LUTs. It used not to, and that was when this made sense.
* vo_opengl: simplify and improve up scale=oversampleNiklas Haas2016-04-121-21/+5
| | | | | | | | | Since what we're doing is a linear blend of the four colors, we can just do it for free by using GPU sampling. This requires significantly fewer texture fetches and calculations to compute the final color, making it much more efficient. The code is also much shorter and simpler.
* vo_opengl: generate 3DLUT against source and use full BT.1886Niklas Haas2016-04-011-2/+3
| | | | | | | | | | | | | | | | | | | This commit refactors the 3DLUT loading mechanism to build the 3DLUT against the original source characteristics of the file. This allows us, among other things, to use a real BT.1886 profile for the source. This also allows us to actually use perceptual mappings. Finally, this reduces errors on standard gamut displays (where the previous 3DLUT target of BT.2020 was unreasonably wide). This also improves the overall accuracy of the 3DLUT due to eliminating rounding errors where possible, and allows for more accurate use of LUT-based ICC profiles. The current code is somewhat more ugly than necessary, because the idea was to implement this commit in a working state first, and then maybe refactor the profile loading mechanism in a later commit. Fixes #2815.
* vo_opengl: fix sharpen filterwm42016-03-161-2/+2
| | | | | | Regression since commit 93546f0c. Fixes #2956.
* vo_opengl: refactor pass_read_video and texture bindingNiklas Haas2016-03-051-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a pretty major rewrite of the internal texture binding mechanic, which makes it more flexible. In general, the difference between the old and current approaches is that now, all texture description is held in a struct img_tex and only explicitly bound with pass_bind. (Once bound, a texture unit is assumed to be set in stone and no longer tied to the img_tex) This approach makes the code inside pass_read_video significantly more flexible and cuts down on the number of weird special cases and spaghetti logic. It also has some improvements, e.g. cutting down greatly on the number of unnecessary conversion passes inside pass_read_video (which was previously mostly done to cope with the fact that the alternative would have resulted in a combinatorial explosion of code complexity). Some other notable changes (and potential improvements): - texture expansion is now *always* handled in pass_read_video, and the colormatrix never does this anymore. (Which means the code could probably be removed from the colormatrix generation logic, modulo some other VOs) - struct fbo_tex now stores both its "physical" and "logical" (configured) size, which cuts down on the amount of width/height baggage on some function calls - vo_opengl can now technically support textures with different bit depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries inside img_format.c doesn't export this (nor does ffmpeg support it, really) so the status quo of using the same tex_mul for all planes is kept. - dumb_mode is now only needed because of the indirect_fbo being in the main rendering pipeline. If we reintroduce p->use_indirect and thread a transform through the entire program this could be skipped where unnecessary, allowing for the removal of dumb_mode. But I'm not sure how to do this in a clean way. (Which is part of why it got introduced to begin with) - It would be trivial to resurrect source-shader now (it would just be one extra 'if' inside pass_read_video).
* vo_opengl: set uniform variable "pixel_size" for internal shadersigv2016-02-261-7/+5
|
* vo_opengl: declare vec4 color inside fragment shader stubNiklas Haas2016-02-231-6/+5
| | | | | | Why was this done so stupidly, with so many complicated special cases, before? Declare it once so the shader bits don't have to figure out where and when to do so themselves.
* vo_opengl: don't use normalized coords for debanding rectangle textureswm42016-02-181-1/+2
| | | | Fixes #2831.
* Change GPL/LGPL dual-licensed files to LGPLwm42016-01-191-12/+7
| | | | | | | | | | | Do this to make the license situation less confusing. This change should be of no consequence, since LGPL is compatible with GPL anyway, and making it LGPL-only does not restrict the use with GPL code. Additionally, the wording implies that this is allowed, and that we can just remove the GPL part.
* vo_opengl: fix shader compilation regressionwm42015-12-081-4/+0
| | | | | | | | | | The recent LUT adjustment changes broke interpolation. The concatenation of the shader stages is a bit messy, and it seems like sampler_prelude is not a good place to add this macro. Always add the macro to every shader instead. (While this doesn't seem too elegant, this isn't too inelegant either, and goes these problems out of the way.)
* vo_opengl: Fix minor LUT sampling errorBin Jin2015-12-071-7/+15
| | | | | | Define a macro to correct the coordinate for lookup texture. Cache the corrected coordinate for 1D filter and use mix() to minimize the performance impact.
* vo_opengl: improve boundary check for polar filtersBin Jin2015-12-051-2/+2
| | | | | | | | | If the sampling point is placed diagonally, the radius difference could be as large as sqrt(2.0). And a loosened check with (radius - 1) would potentially include pixels out of the range. Fix the check to handle those corner case properly to avoid unnecessary texture lookup and improve the performance a bit.
* vo_opengl: make 1D textures completely optionalwm42015-11-191-1/+5
| | | | | | | | Polar scalers use 1D textures, because they're slightly faster on some GPUs than 2D textures. But 2D textures work too, so add support for them. Allows using these scalers with ANGLE.
* vo_opengl: fix some more GLES shader issueswm42015-11-191-6/+6
| | | | | | Just like commit f9a2fc59. There are probably some more such cases. The vec2 constructor calls are probably fine, but don't bother with confusing inconsistencies.
* vo_opengl: make the default debanding settings less excessiveNiklas Haas2015-10-211-2/+2
| | | | | | | | | It's great that the new algorithm supports multiple placebo iterations and all, but it's really not necessary and hurts performance in the general case for the sake of the 0.1% that actually pause the screen and look for minute differences. Signed-off-by: wm4 <wm4@nowhere>