summaryrefslogtreecommitdiffstats
path: root/video/out/opengl/utils.c
Commit message (Collapse)AuthorAgeFilesLines
* vo_gpu: make screenshots use the GL rendererwm42018-02-111-12/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | Using the GL renderer for color conversion will make sure screenshots will use the same conversion as normal video rendering. It can do this for all types of screenshots. The logic when to write 16 bit PNGs changes. To approximate the old behavior, we decide by looking whether the source video format has more than 8 bits per component. We apply this logic even for window screenshots. Also, 16 bit PNGs now always include an unused alpha channel. The reason is that FFmpeg has RGB48 and RGBA64 formats, but no RGB064. RGB48 is 3 bytes and usually not supported by GPUs for rendering, so we have to use RGBA64, which forces an alpha channel. Will break for users who use --target-trc and similar options. I considered creating a new gl_video context, but it could double GPU memory use, so I didn't. This uses FBOs instead of glGetTexImage(), because that increases the chance it could work on GLES (e.g. ANGLE). Untested. No support for the Vulkan and D3D11 backends yet. Fixes #5498. Also fixes #5240, because the code for reading back is not used with the new code path.
* vo_gpu: ra_gl: remove stride hackwm42017-11-121-4/+1
| | | | Same reasoning as in commit 9b5d062d36e3.
* vo_gpu: gl: implement proper extension string searchwm42017-10-021-0/+18
| | | | | | | The existing code in check_ext() avoided false positive due to sub-strings, but allowed false negatives. Fix this with slightly better search code, and make it available as function to other source files. (There are some cases of strstr() still around.)
* vo_opengl: refactor into vo_gpuNiklas Haas2017-09-211-313/+211
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is done in several steps: 1. refactor MPGLContext -> struct ra_ctx 2. move GL-specific stuff in vo_opengl into opengl/context.c 3. generalize context creation to support other APIs, and add --gpu-api 4. rename all of the --opengl- options that are no longer opengl-specific 5. move all of the stuff from opengl/* that isn't GL-specific into gpu/ (note: opengl/gl_utils.h became opengl/utils.h) 6. rename vo_opengl to vo_gpu 7. to handle window screenshots, the short-term approach was to just add it to ra_swchain_fns. Long term (and for vulkan) this has to be moved to ra itself (and vo_gpu altered to compensate), but this was a stop-gap measure to prevent this commit from getting too big 8. move ra->fns->flush to ra_gl_ctx instead 9. some other minor changes that I've probably already forgotten Note: This is one half of a major refactor, the other half of which is provided by rossy's following commit. This commit enables support for all linux platforms, while his version enables support for all non-linux platforms. Note 2: vo_opengl_cb.c also re-uses ra_gl_ctx so it benefits from the --opengl- options like --opengl-early-flush, --opengl-finish etc. Should be a strict superset of the old functionality. Disclaimer: Since I have no way of compiling mpv on all platforms, some of these ports were done blindly. Specifically, the blind ports included context_mali_fbdev.c and context_rpi.c. Since they're both based on egl_helpers, the port should have gone smoothly without any major changes required. But if somebody complains about a compile error on those platforms (assuming anybody actually uses them), you know where to complain.
* vo_opengl: generalize UBO packing/handlingNiklas Haas2017-09-121-0/+42
| | | | | | This is simultaneously generalized into two directions: 1. Support more sc_uniform types (needed for SC_UNIFORM_TYPE_PUSHC) 2. Support more flexible packing (needed for both PUSHC and ra_d3d11)
* vo_opengl: refactor timer_pool_measure (again)Niklas Haas2017-09-111-4/+6
| | | | | | | Instead of relying on power-of-two buffer sizes and unsigned overflow, make this code more robust (and also cleaner). Why can't C get a real modulo operator?
* vo_opengl: fix out-of-bounds access in timer_pool_measureNiklas Haas2017-09-111-1/+2
| | | | | | This was there even before the refactor, but the refactor exposed the bug. I hate C's useless fucking modulo operator so much. I've gotten hit by this exact bug way too many times.
* vo_opengl: refactor/fix mp_pass_perf codeNiklas Haas2017-09-111-15/+13
| | | | | | | | | | | | | This was needlessly complicated and prone to breakage, because even the references to the ring buffer could end up getting invalidated and containing garbage data on e.g. shader cache flush. For much the same reason why we can't keep around the *timer_pool, we're also forced to hard-copy the entire sample buffer per pass per frame. Not a huge deal, though. This is, what, a few kB per frame? We have more pressing CPU performance concerns anyway. Also simplified/fixed some other code.
* vo_opengl: be explicit about IMG_RWNiklas Haas2017-08-271-0/+1
| | | | | | | | | | | | Both vulkan and opengl distinguish between rendering to an image and using an image as a storage attachment. So make this an explicit capability instead of lumping it in with render_dst. (That way we could support, for example, using an image as a storage attachment without requiring a framebuffer) The real reason for this change is that you can directly use the output FBO as a storage attachment on vulkan but you can't on opengl, which makes this param structly separate from render_dst.
* vo_opengl: refactor tex_upload to ra_buf_poolNiklas Haas2017-08-221-57/+85
| | | | | | | | | | Also refactors the usage of tex_upload to make ra_tex_upload_pbo a RA-internal thing again. ra_buf_pool has the main advantage of being dynamically sized depending on buf_poll, so for OpenGL we'll end up only using one buffer (when not persistently mapping) - while for vulkan we'll use as many as necessary, which depends on the swapchain depth anyway.
* vo_opengl: allow texture uploads to failNiklas Haas2017-08-181-6/+3
| | | | Surprisingly makes the code shorter, not longer
* vo_opengl: refactor RA texture and buffer updatesNiklas Haas2017-08-181-0/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - tex_uploads args are moved to a struct - the ability to directly upload texture data without going through a buffer is made explicit - the concept of buffer updates and buffer polling is made more explicit and generalized to buf_update as well (not just mapped buffers) - the ability to call tex_upload/buf_update on a tex/buf is made explicit during tex/buf creation - uploading from buffers now uses an explicit offset instead of implicitly comparing *src against buf->data, because not all buffers may actually be persistently mapped - the initial_data = immutable requirement is dropped. (May be re-added later for D3D11 if that ever becomes a thing) This change helps the vulkan abstraction immensely and also helps move common code (like the PBO pooling) out of ra_gl and into the opengl/utils.c This also technically has the side-benefit / side-constraint of using PBOs for OSD texture uploads as well, which actually seems to help performance on machines where --opengl-pbo is faster than the naive code path. Because of this, I decided to hook up the OSD code to the opengl-pbo option as well. One drawback of this refactor is that the GL_STREAM_COPY hack for texture uploads "got lost", but I think I'm happy with that going away anyway since DR almost fully deprecates it, and it's not the "right thing" anyway - but instead an nvidia-only hack to make this stuff work somewhat better on NUMA systems with discrete GPUs. Another change is that due to the way fencing works with ra_buf (we get one fence per ra_buf per upload) we have to use multiple ra_bufs instead of offsets into a shared buffer. But for OpenGL this is probably better anyway. It's possible that in future, we could support having independent “buffer slices” (each with their own fence/sync object), but this would be an optimization more than anything. I also think that we could address the underlying problem (memory closeness) differently by making the ra_vk memory allocator smart enough to chunk together allocations under the hood.
* vo_opengl: make blitting an explicit capabilityNiklas Haas2017-08-181-0/+1
| | | | | | | | | Instead of merging it into render_dst. This is better for vulkan, because blitting in vulkan both does not require a FBO *and* requires a different image layout. Also less "hacky" for OpenGL, since now the weird blit=FBO requirement is an implementation detail of ra_gl
* vo_opengl: make ra_fns.timer_create optionalNiklas Haas2017-08-181-0/+3
|
* vo_opengl: simplify mirroring and fix it if glBlitFramebuffer is usedwm42017-08-071-11/+19
| | | | | | | | | | | | | | | | | | The vp_w/vp_h variables and parameters were not really used anymore (they were redundant with ra_tex w/h) - but vp_h was still used to identify whether rendering should be done mirrored. Simplify this by adding a fbodst struct (some bad naming), which contains the render target texture, and some parameters how it should be rendered to (for now only flipping). It would not be appropriate to make this a member of ra_tex, so it's a separate struct. Introduces a weird regression for the first frame rendered after interpolation is toggled at runtime, but seems to work otherwise. This is possibly due to the change that blit() now mirrors, instead of just copying. (This is also why ra_fns.blit is changed.) Fixes #4719.
* vo_opengl: drop pointless fbotex_init() functionwm42017-08-071-10/+1
|
* vo_opengl: drop ra_gl.h from shader_cache.cNiklas Haas2017-08-061-0/+16
| | | | | | Since the GL *gl is no longer needed for the timers, we can get rid of the sc->gl dependency. This requires moving a utility function (which is not GL-specific anyway) out of gl_utils.h and into utils.h
* vo_opengl: move timers to struct raNiklas Haas2017-08-061-0/+103
| | | | | | | In order to prevent code duplication and keep the ra abstraction as small as possible, `ra` only implements the actual timer queries, it does not do pooling/averaging of the results. This is instead moved to a ra-neutral struct timer_pool in utils.c.
* vo_opengl: split utils.c/hwm42017-08-051-1465/+24
| | | | | | | | | Actually GL-specific parts go into gl_utils.c/h, the shader cache (gl_sc*) into shader_cache.c/h. No semantic changes of any kind, except that the VAO helper is made public again as part of gl_utils.c (all while the goal for gl_utils.c itself is to be included by GL-specific code).
* vo_opengl: make fbotex helper use rawm42017-08-051-74/+44
| | | | | | | | Further work removing GL dependencies from the actual video renderer, and moving them into ra backends. Use of glInvalidateFramebuffer() falls away. I'd like to keep this, but it's better to readd it once shader runs are in ra.
* vo_opengl: drop unused custom texture filter for FBO helperwm42017-08-051-17/+2
|
* vo_opengl: make compute shaders more flexibleNiklas Haas2017-08-031-2/+2
| | | | | | This allows users to do their own custom sample writing, mainly meant to address use cases such as RAVU. Also clean up the compute shader code a bit.
* vo_opengl: manage dither texture via rawm42017-07-291-0/+12
| | | | | | | | | Also add some more helpers. Fix the broken math.h include statement. utils.c uses ra_gl.h internals, which it shouldn't, and which will be removed again as soon as this code gets converted to ra fully.
* vo_opengl: fix image uniforms for older OpenGLJames Ross-Gowan2017-07-261-0/+2
| | | | | This explicitly enables the GL_ARB_shader_image_load_store extension, which seems to fix compute shaders for Intel/GL 3.0.
* vo_opengl: kill off FBOTEX_COMPUTE againNiklas Haas2017-07-251-20/+15
| | | | | | | | | | | The textures not having an FBO actually caused regressions when trying to render the subtitles on top of this texture (--blend-subtitles), which still relied on an FBO. So just kill off the logic entirely. Why worry about a single FBO wasted when we're allocating like 10 anyway. Fixes #4657.
* vo_opengl: enable compute shader for mesaBin Jin2017-07-251-0/+7
| | | | | | | | | Mesa 17.1 supports compute shader but not full specs of OpenGL 4.3. Change the code to detect OpenGL extension "GL_ARB_compute_shader" rather than OpenGL version 4.3. HDR peak detection requires SSBO, and polar scaler requires 2D array extension. Add these extensions as requirement as well.
* vo_opengl: support HDR peak detectionNiklas Haas2017-07-241-0/+60
| | | | | | | | | | | | | | This is done via compute shaders. As a consequence, the tone mapping algorithms had to be rewritten to compute their known constants in GLSL (ahead of time), instead of doing it once. Didn't affect performance. Using shmem/SSBO atomics in this way is extremely fast on nvidia, but it might be slow on other platforms. Needs testing. Unfortunately, setting up the SSBO still requires OpenGL calls, which means I can't have it in video_shaders.c, where it belongs. But I'll defer worrying about that until the backend refactor, since then I'll be breaking up the video/video_shaders structure anyway.
* vo_opengl: support compute shadersNiklas Haas2017-07-241-94/+192
| | | | | | | | These can either be invoked as dispatch_compute to do a single computation, or finish_pass_fbo (after setting compute_size_minimum) to render to a new texture using a compute shader. To make this stuff all work transparently, we try really, really hard to make compute shaders as identical to fragment shaders as possible in their behavior.
* common, vo_opengl: add/use helper for formatted strings on the stackwm42017-07-241-2/+1
| | | | | | | | | | | Seems like I really like this C99 idiom. No reason not to generalize it do snprintf(). Introduce mp_tprintf(), which basically this idiom to snprintf(). This macro looks like it returns a string that was allocated with alloca() on the caller site, except it's portable C99/C11. (And unlike alloca(), the result is valid only within block scope.) Use it in 2 places in the vo_opengl code. But it has the potential to make a whole bunch of weird looking code look slightly nicer.
* vo_opengl: make VAO helper private, remove old VAO mechanismwm42017-07-221-28/+17
| | | | The struct describing vertex attributes is still public, of course.
* vo_opengl: add mechanism to create/cache VAO on the flywm42017-07-221-9/+56
| | | | | | This removes VAO handling from video.c. Instead the shader cache will create the VAO as needed. The consequence is that this creates a VAO per shader, which might be a bit wasteful, but doesn't matter anyway.
* vo_opengl: osd: refactor and simplifywm42017-07-221-13/+0
| | | | | | | | Reduce this to 1 draw call per OSD pass. This removes the need for some annoying special handling regarding 3D video support (we supported duplicating the OSD/subtitles for side-by-side 3D output etc.). Remove the unneeded texture sampler uniform thing.
* vo_opengl: styleNiklas Haas2017-07-161-1/+1
| | | | Use uintptr_t instead of size_t. Shouldn't matter, but is cleaner.
* vo_opengl: use MP_ALIGN_UP instead of FFALIGNNiklas Haas2017-07-161-1/+1
| | | | Consistency/style
* vo_opengl: use glBufferSubData instead of glMapBufferRangeNiklas Haas2017-07-161-24/+8
| | | | | | | | | | | | | | | | | | | | Performance seems pretty much unchanged but I no longer get nasty spikes on NUMA systems, probably because glBufferSubData runs in the driver or something. As a simplification of the code, we also just size the PBO to always have the full size, even for cropped textures. This seems slower but not by relevant amounts, and only affects e.g. --vf=crop. It also slightly increases VRAM usage for textures with big strides. This new code path is especially nice because it no longer depends on GL_ARB_map_buffer_range, and no longer uses any functions that can possibly fail, thus simplifying control flow and seemingly deprecating the manpage's claim about possible image corruption. In theory we could also reduce NUM_PBO_BUFFERS since it doesn't seem like we're streaming uploads anyway, but leave it in there just in case some drivers disagree...
* vo_opengl: update BufferData usage hintsNiklas Haas2017-07-151-2/+11
| | | | | | | | | | | | | | | STREAM is better than DYNAMIC because we're only using it once per frame. As for COPY vs DRAW, that was pretty much incorrect to begin with - but surprisngly, COPY is actually faster (sometimes significantly so, e.g. on my NUMA system). After testing, the best I can gather is that it has to do with the fact that COPY requires fewer redundant memcpy()s, and also 3x reduce RAM bandwidth (in theory). Anyway, that bit shouldn't introduce any regressions, it's just a documentation update. Maybe I'll change my mind about the comment again the future, it's really hard to tell. Vulkan, please save us!
* vo_opengl: coalesce intra-plane PBOsNiklas Haas2017-07-151-13/+16
| | | | | | | | | | | | | | Instead of allocating three PBOs and cycling through them, we allocate one PBO that's three times as large, and cycle through the subregion offsets. This results in arguably simpler code and faster initialization performance. Especially for 4K textures, initializing PBOs can take quite some time (e.g. 180ms -> 110ms). For 1080p, it's more like 66ms -> 52ms for me. The alignment to 4096 is completely unnecessary by spec, but we do it anyway just for peace of mind.
* vo_opengl: refactor vo performance subsystemNiklas Haas2017-07-011-29/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces `vo-performance` by `vo-passes`, bringing with it a number of changes and improvements: 1. mpv users can now introspect the vo_opengl passes, which is something that has been requested multiple times. 2. performance data is now measured per-pass, which helps both development and debugging. 3. since adding more passes is cheap, we can now report information for more passes (e.g. the blit pass, and the osd pass). Note: we also switch to nanosecond scale, to be able to measure these passes better. 4. `--user-shaders` authors can now describe their own passes, helping users both identify which user shaders are active at any given time as well as helping shader authors identify performance issues. 5. the timing data per pass is now exported as a full list of samples, so projects like Argon-/mpv-stats can immediately read out all of the samples and render a graph without having to manually poll this option constantly. Due to gl_timer's design being complicated (directly reading performance data would block, so we delay the actual read-back until the next _start command), it's vital not to conflate different passes that might be doing different things from one frame to another. To accomplish this, the actual timers are stored as part of the gl_shader_cache's sc_entry, which makes them unique for that exact shader. Starting and stopping the time measurement is easy to unify with the gl_sc architecture, because the existing API already relies on a "generate, render, reset" flow, so we can just put timer_start and timer_stop in sc_generate and sc_reset, respectively. The ugliest thing about this code is that due to the need to keep pass information relatively stable in between frames, we need to distinguish between "new" and "redrawn" frames, which bloats the code somewhat and also feels hacky and vo_opengl-specific. (But then again, this entire thing is vo_opengl-specific)
* vo_opengl: add option for caching shaders on diskwm42017-04-081-11/+122
| | | | | | | | | | | | | Mostly because of ANGLE (sadly). The implementation became unpleasantly big, but at least it's relatively self-contained. I'm not sure to what degree shaders from different drivers are compatible as in whether a driver would randomly misbehave if it's fed a binary created by another driver. The useless binayFormat parameter won't help it, as they can probably easily clash. As usual, OpenGL is pretty shit here.
* vo_opengl: remove two unused symbolswm42017-04-081-1/+0
|
* vo_opengl: use 16 bit textures with anglewm42017-04-031-1/+5
| | | | | | Regression due to 03fe506. It accidentally changed the default value if glGetTexLevelParameteriv() is not available, which is the case with ANGLE.
* vo_opengl: read framebuffer depth from actual FBO used for renderingwm42017-03-201-3/+3
| | | | | | | | | | | | In some cases, such as when using the libmpv opengl-cb API, or with certain vo_opengl backends, the main framebuffer is never accessed. Instead, rendering is done to a FBO that acts as back buffer. This meant an incorrect/broken bit depth could be used for dithering. Change it to read the framebuffer depth lazily on the first render call. Also move the main FBO field out of the GL struct to MPGLContext, because the renderer's init function does not need to access it anymore.
* vo_opengl: move some init_gl code to utility functionswm42017-03-201-0/+46
|
* vo_opengl: fix some undefined behaviorwm42017-03-181-1/+1
| | | | | | | The gl_timer_last_us() function could access samples[-1]. Fix by coercing to unsigned, so the % will put it into index [0,max). The real value returned in this corner case doesn't mean too much, I guess.
* Fix two typoswm42017-02-201-1/+1
| | | | They're unrelated. Sue me.
* vo_opengl: don't rely on viewport to contain window dimensionswm42016-12-021-7/+4
| | | | | | | | | | | | | Apparently we don't always set the viewport to window dimensions anymore, e.g. if nothing is actually rendered. This means the viewport can contain old values. The window screenshot code uses the viewport values to guess the default framebuffer dimensions. With --force-window --idle --no-osc (which draws nothing and issues a glClear() command only), taking a screenshot would yield an image with the wrong size and possibly garbage in it. Fix this by explicitly passing the currently known window dimensions. Abusing the values stored in the viewport was questionable anyway.
* vo_opengl: fix OSD with icc-profile after previous commitwm42016-09-141-0/+2
| | | | | | This happened to break because the texture unit wasn't reset to 0, which some code expects. The OSD code in particular set the OSD texture on the wrong texture unit, with the result that OSD/OSC was not visible.
* vo_opengl: dynamically manage texture unitswm42016-09-141-3/+40
| | | | | | | | | | | | | | | | | | | | | A minor cleanup that makes the code simpler, and guarantees that we cleanup the GL state properly at any point. We do this by reusing the uniform caching, and assigning each sampler uniform its own texture unit by incrementing a counter. This has various subtle consequences for the GL driver, which hopefully don't matter. For example, it will bind fewer textures at a time, but also rebind them more often. For some reason we keep TEXUNIT_VIDEO_NUM, because it limits the number of hook passes that can be bound at the same time. OSD rendering is an exception: we do many passes with the same shader, and rebinding the texture each pass. For now, this is handled in an unclean way, and we make the shader cache reserve texture unit 0 for the OSD texture. At a later point, we should allocate that one dynamically too, and just pass the texture unit to the OSD rendering code. Right now I feel like vo_rpi.c (may it rot in hell) is in the way.
* vo_opengl: require explicit reset on shader cache after renderingwm42016-09-141-4/+22
| | | | | | | | | The caller now has to