summaryrefslogtreecommitdiffstats
path: root/video/out/opengl/utils.h
Commit message (Collapse)AuthorAgeFilesLines
* vo_opengl: refactor RA texture and buffer updatesNiklas Haas2017-08-181-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - tex_uploads args are moved to a struct - the ability to directly upload texture data without going through a buffer is made explicit - the concept of buffer updates and buffer polling is made more explicit and generalized to buf_update as well (not just mapped buffers) - the ability to call tex_upload/buf_update on a tex/buf is made explicit during tex/buf creation - uploading from buffers now uses an explicit offset instead of implicitly comparing *src against buf->data, because not all buffers may actually be persistently mapped - the initial_data = immutable requirement is dropped. (May be re-added later for D3D11 if that ever becomes a thing) This change helps the vulkan abstraction immensely and also helps move common code (like the PBO pooling) out of ra_gl and into the opengl/utils.c This also technically has the side-benefit / side-constraint of using PBOs for OSD texture uploads as well, which actually seems to help performance on machines where --opengl-pbo is faster than the naive code path. Because of this, I decided to hook up the OSD code to the opengl-pbo option as well. One drawback of this refactor is that the GL_STREAM_COPY hack for texture uploads "got lost", but I think I'm happy with that going away anyway since DR almost fully deprecates it, and it's not the "right thing" anyway - but instead an nvidia-only hack to make this stuff work somewhat better on NUMA systems with discrete GPUs. Another change is that due to the way fencing works with ra_buf (we get one fence per ra_buf per upload) we have to use multiple ra_bufs instead of offsets into a shared buffer. But for OpenGL this is probably better anyway. It's possible that in future, we could support having independent “buffer slices” (each with their own fence/sync object), but this would be an optimization more than anything. I also think that we could address the underlying problem (memory closeness) differently by making the ra_vk memory allocator smart enough to chunk together allocations under the hood.
* vo_opengl: simplify mirroring and fix it if glBlitFramebuffer is usedwm42017-08-071-1/+8
| | | | | | | | | | | | | | | | | | The vp_w/vp_h variables and parameters were not really used anymore (they were redundant with ra_tex w/h) - but vp_h was still used to identify whether rendering should be done mirrored. Simplify this by adding a fbodst struct (some bad naming), which contains the render target texture, and some parameters how it should be rendered to (for now only flipping). It would not be appropriate to make this a member of ra_tex, so it's a separate struct. Introduces a weird regression for the first frame rendered after interpolation is toggled at runtime, but seems to work otherwise. This is possibly due to the change that blit() now mirrors, instead of just copying. (This is also why ra_fns.blit is changed.) Fixes #4719.
* vo_opengl: drop pointless fbotex_init() functionwm42017-08-071-2/+0
|
* vo_opengl: drop ra_gl.h from shader_cache.cNiklas Haas2017-08-061-0/+4
| | | | | | Since the GL *gl is no longer needed for the timers, we can get rid of the sc->gl dependency. This requires moving a utility function (which is not GL-specific anyway) out of gl_utils.h and into utils.h
* vo_opengl: move timers to struct raNiklas Haas2017-08-061-0/+9
| | | | | | | In order to prevent code duplication and keep the ra abstraction as small as possible, `ra` only implements the actual timer queries, it does not do pooling/averaging of the results. This is instead moved to a ra-neutral struct timer_pool in utils.c.
* vo_opengl: split utils.c/hwm42017-08-051-134/+15
| | | | | | | | | Actually GL-specific parts go into gl_utils.c/h, the shader cache (gl_sc*) into shader_cache.c/h. No semantic changes of any kind, except that the VAO helper is made public again as part of gl_utils.c (all while the goal for gl_utils.c itself is to be included by GL-specific code).
* vo_opengl: make fbotex helper use rawm42017-08-051-11/+8
| | | | | | | | Further work removing GL dependencies from the actual video renderer, and moving them into ra backends. Use of glInvalidateFramebuffer() falls away. I'd like to keep this, but it's better to readd it once shader runs are in ra.
* vo_opengl: drop unused custom texture filter for FBO helperwm42017-08-051-2/+0
|
* vo_opengl: make compute shaders more flexibleNiklas Haas2017-08-031-2/+2
| | | | | | This allows users to do their own custom sample writing, mainly meant to address use cases such as RAVU. Also clean up the compute shader code a bit.
* vo_opengl: manage dither texture via rawm42017-07-291-1/+5
| | | | | | | | | Also add some more helpers. Fix the broken math.h include statement. utils.c uses ra_gl.h internals, which it shouldn't, and which will be removed again as soon as this code gets converted to ra fully.
* vo_opengl: add PRINTF_ATTRIBUTE to gl_sc_ssboNiklas Haas2017-07-251-1/+1
| | | | | Doesn't uncover any bugs, but apparently we're getting in the habit of this anyway.
* vo_opengl: kill off FBOTEX_COMPUTE againNiklas Haas2017-07-251-1/+0
| | | | | | | | | | | The textures not having an FBO actually caused regressions when trying to render the subtitles on top of this texture (--blend-subtitles), which still relied on an FBO. So just kill off the logic entirely. Why worry about a single FBO wasted when we're allocating like 10 anyway. Fixes #4657.
* vo_opengl: support HDR peak detectionNiklas Haas2017-07-241-0/+2
| | | | | | | | | | | | | | This is done via compute shaders. As a consequence, the tone mapping algorithms had to be rewritten to compute their known constants in GLSL (ahead of time), instead of doing it once. Didn't affect performance. Using shmem/SSBO atomics in this way is extremely fast on nvidia, but it might be slow on other platforms. Needs testing. Unfortunately, setting up the SSBO still requires OpenGL calls, which means I can't have it in video_shaders.c, where it belongs. But I'll defer worrying about that until the backend refactor, since then I'll be breaking up the video/video_shaders structure anyway.
* vo_opengl: support compute shadersNiklas Haas2017-07-241-1/+6
| | | | | | | | These can either be invoked as dispatch_compute to do a single computation, or finish_pass_fbo (after setting compute_size_minimum) to render to a new texture using a compute shader. To make this stuff all work transparently, we try really, really hard to make compute shaders as identical to fragment shaders as possible in their behavior.
* vo_opengl: check format on some printf-like callswm42017-07-241-2/+4
| | | | Fix 1 incorrect use.
* vo_opengl: make VAO helper private, remove old VAO mechanismwm42017-07-221-16/+0
| | | | The struct describing vertex attributes is still public, of course.
* vo_opengl: add mechanism to create/cache VAO on the flywm42017-07-221-0/+5
| | | | | | This removes VAO handling from video.c. Instead the shader cache will create the VAO as needed. The consequence is that this creates a VAO per shader, which might be a bit wasteful, but doesn't matter anyway.
* vo_opengl: osd: refactor and simplifywm42017-07-221-2/+0
| | | | | | | | Reduce this to 1 draw call per OSD pass. This removes the need for some annoying special handling regarding 3D video support (we supported duplicating the OSD/subtitles for side-by-side 3D output etc.). Remove the unneeded texture sampler uniform thing.
* vo_opengl: coalesce intra-plane PBOsNiklas Haas2017-07-151-1/+1
| | | | | | | | | | | | | | Instead of allocating three PBOs and cycling through them, we allocate one PBO that's three times as large, and cycle through the subregion offsets. This results in arguably simpler code and faster initialization performance. Especially for 4K textures, initializing PBOs can take quite some time (e.g. 180ms -> 110ms). For 1080p, it's more like 66ms -> 52ms for me. The alignment to 4096 is completely unnecessary by spec, but we do it anyway just for peace of mind.
* vo_opengl: refactor vo performance subsystemNiklas Haas2017-07-011-7/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces `vo-performance` by `vo-passes`, bringing with it a number of changes and improvements: 1. mpv users can now introspect the vo_opengl passes, which is something that has been requested multiple times. 2. performance data is now measured per-pass, which helps both development and debugging. 3. since adding more passes is cheap, we can now report information for more passes (e.g. the blit pass, and the osd pass). Note: we also switch to nanosecond scale, to be able to measure these passes better. 4. `--user-shaders` authors can now describe their own passes, helping users both identify which user shaders are active at any given time as well as helping shader authors identify performance issues. 5. the timing data per pass is now exported as a full list of samples, so projects like Argon-/mpv-stats can immediately read out all of the samples and render a graph without having to manually poll this option constantly. Due to gl_timer's design being complicated (directly reading performance data would block, so we delay the actual read-back until the next _start command), it's vital not to conflate different passes that might be doing different things from one frame to another. To accomplish this, the actual timers are stored as part of the gl_shader_cache's sc_entry, which makes them unique for that exact shader. Starting and stopping the time measurement is easy to unify with the gl_sc architecture, because the existing API already relies on a "generate, render, reset" flow, so we can just put timer_start and timer_stop in sc_generate and sc_reset, respectively. The ugliest thing about this code is that due to the need to keep pass information relatively stable in between frames, we need to distinguish between "new" and "redrawn" frames, which bloats the code somewhat and also feels hacky and vo_opengl-specific. (But then again, this entire thing is vo_opengl-specific)
* vo_opengl: add option for caching shaders on diskwm42017-04-081-0/+3
| | | | | | | | | | | | | Mostly because of ANGLE (sadly). The implementation became unpleasantly big, but at least it's relatively self-contained. I'm not sure to what degree shaders from different drivers are compatible as in whether a driver would randomly misbehave if it's fed a binary created by another driver. The useless binayFormat parameter won't help it, as they can probably easily clash. As usual, OpenGL is pretty shit here.
* vo_opengl: remove two unused symbolswm42017-04-081-1/+0
|
* vo_opengl: read framebuffer depth from actual FBO used for renderingwm42017-03-201-1/+1
| | | | | | | | | | | | In some cases, such as when using the libmpv opengl-cb API, or with certain vo_opengl backends, the main framebuffer is never accessed. Instead, rendering is done to a FBO that acts as back buffer. This meant an incorrect/broken bit depth could be used for dithering. Change it to read the framebuffer depth lazily on the first render call. Also move the main FBO field out of the GL struct to MPGLContext, because the renderer's init function does not need to access it anymore.
* vo_opengl: move some init_gl code to utility functionswm42017-03-201-0/+3
|
* vo_opengl: don't rely on viewport to contain window dimensionswm42016-12-021-1/+1
| | | | | | | | | | | | | Apparently we don't always set the viewport to window dimensions anymore, e.g. if nothing is actually rendered. This means the viewport can contain old values. The window screenshot code uses the viewport values to guess the default framebuffer dimensions. With --force-window --idle --no-osc (which draws nothing and issues a glClear() command only), taking a screenshot would yield an image with the wrong size and possibly garbage in it. Fix this by explicitly passing the currently known window dimensions. Abusing the values stored in the viewport was questionable anyway.
* vo_opengl: dynamically manage texture unitswm42016-09-141-1/+3
| | | | | | | | | | | | | | | | | | | | | A minor cleanup that makes the code simpler, and guarantees that we cleanup the GL state properly at any point. We do this by reusing the uniform caching, and assigning each sampler uniform its own texture unit by incrementing a counter. This has various subtle consequences for the GL driver, which hopefully don't matter. For example, it will bind fewer textures at a time, but also rebind them more often. For some reason we keep TEXUNIT_VIDEO_NUM, because it limits the number of hook passes that can be bound at the same time. OSD rendering is an exception: we do many passes with the same shader, and rebinding the texture each pass. For now, this is handled in an unclean way, and we make the shader cache reserve texture unit 0 for the OSD texture. At a later point, we should allocate that one dynamically too, and just pass the texture unit to the OSD rendering code. Right now I feel like vo_rpi.c (may it rot in hell) is in the way.
* vo_opengl: require explicit reset on shader cache after renderingwm42016-09-141-1/+2
| | | | | | | | | The caller now has to call gl_sc_reset(), and _after_ rendering. This way we can unset OpenGL state that was setup for rendering. This affects the shader program, for example. The next commit uses this to automatically manage texture units via the shader cache. vo_rpi.c changes untested.
* vo_opengl: make the number of PBOs tunableNiklas Haas2016-09-141-1/+3
| | | | | Also set the number of PBOs from 2 to 3, which should be better for pipelining. This makes it easier to add more in the future.
* vo_opengl: use ringbuffer of PBOswm42016-07-031-1/+2
| | | | | | | | This is how PBOs are normally supposed to be used. Unfortunately I can't see an any absolute improvement on nVidia binary drivers and playing 4K material. Compared to the "old" PBO path with 1 buffer, the measured GL time decreases significantly, though.
* vo_opengl: move PBO upload handling to shared codewm42016-07-031-0/+12
| | | | | | | | | | | | | This introduces a gl_pbo_upload_tex() function, which works almost like our gl_upload_tex() glTexSubImage2D() wrapper, except it takes a struct which caches the PBO handles. It also takes the full texture size (to make allocating an ideal buffer size easier), and a parameter to disable PBOs (so that the caller doesn't have to duplicate the gl_upload_tex() call if PBOs are disabled or unavailable). This also removes warnings and fallbacks on PBO failure. We just silently try using PBOs on every frame, and if that fails at some point, revert to normal texture uploads. Probably doesn't matter.
* vo_opengl: remove uniform buffer object routinesBin Jin2016-06-181-2/+0
|
* vo_opengl: add time queriesNiklas Haas2016-06-071-0/+12
| | | | | | | | | | | | | | | | | | To avoid blocking the CPU, we use 8 time objects and rotate through them, only blocking until the last possible moment (before we need access to them on the next iteration through the ring buffer). I tested it out on my machine and 4 query objects were enough to guarantee block-free querying, but the extra margin shouldn't hurt. Frame render times are just output at the end of each frame, via MP_DBG. This might be improved in the future. (In particular, I want to expose these numbers as properties so that users get some more visible feedback about render times) Currently, we measure pass_render_frame and pass_draw_to_screen separately because the former might be called multiple times due to interpolation. Doing it this way gives more faithful numbers. Same goes for frame upload times.
* vo_opengl: fix other minor namespace issueswm42016-05-231-2/+2
| | | | See previous commit.
* vo_opengl: rename glUploadTex, drop unused parameterwm42016-05-231-3/+3
| | | | | | | | Rename it to get out of OpenGL's namespace. The gl_ prefix is used by other mpv functions, but no OpenGL ones. The "slice" parameter was never actually used, and all callers passed 0 for it.
* vo_opengl: unify PBO and normal OSD texture upload pathwm42016-05-231-2/+0
| | | | | | | | | | | | | The main change is actually that e first copy to a "staging" memory frame, and then upload this at once. The old non-PBO code called glTexsubImage2D for each OSD sub-bitmap. The new non-PBO code path is a bit faster now if there are many small sub-bitmaps (on Linux/nVidia). It's also a bit simpler, so this is a win. (Although I don't particularly appreciate the mixed normal/PBO texture code.)
* vo_opengl: support framebuffer invalidationwm42016-05-231-0/+1
| | | | | | | | | | Not sure how much can be gained with this, as we can't use it properly yet. For now, this is used only before rendering, which probably does overwhelmingly nothing. In the future, this should be used after temporary passes, which could possibly reduce memory usage and even memory bandwidth usage, depending on the drivers.
* vo_opengl: make the screen blue on shader errorsNiklas Haas2016-05-151-0/+2
| | | | | This helps visually signify that somthing went wrong, and prevents confusing shader compilation errors with other types of bugs.
* vo_opengl: support external user hooksNiklas Haas2016-05-151-0/+1
| | | | | | | | | | | This allows users to add their own near-arbitrary hooks to the vo_opengl processing pipeline, greatly enhancing the flexibility of user shaders. This enables, among other things, user shaders such as CrossBilateral, SuperRes, LumaSharpen and many more. To make parsing the user shaders easier, shaders are now loaded as bstrs, and the hooks are set up during video reconfig instead of on every single frame.
* vo_opengl: add hooks and rework pass_read_videoNiklas Haas2016-05-151-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hook mechanism allows arbitrary processing stages to get dispatched whenever certain named textures have been "finalized" by the code. This is mostly meant to serve as a change that opens up the internal processing in pass_read_video to user scripts, but as a side benefit all of the code dealing with offsets and plane alignment and other such confusing things has been rewritten. This hook mechanism is powerful enough to cover the needs of both debanding and prescaling (and more), so as a result they can be removed from pass_read_video entirely and implemented through hooks. Some avenues for optimization: - The prescale hook is currently somewhat distributed code-wise. It might be cleaner to split it into superxbr and NNEDI3 hooks which can each be self-contained. - It might be possible to move a large part of the hook code out to an external file (including the hook definitions for debanding and prescaling), which would be very much desired. - Currently, some stages (chroma merging, integer conversion) will *always* run even if unnecessary. I'm planning another series of refactors (deferred img_tex) to allow dropping unnecessary shader stages like these, but that's probably some ways away. In the meantime it would be doable to re-add some of the logic to skip these stages if we know we don't need them. - More hook locations could be added (?)
* vo_opengl: reorganize texture format handlingwm42016-05-121-1/+0
| | | | | | | | | | | | | This merges all knowledge about texture format into a central table. Most of the work done here is actually identifying which formats exactly are supported by OpenGL(ES) under which circumstances, and keeping this information in the format table in a somewhat declarative way. (Although only to the extend needed by mpv.) In particular, ES and float formats are a horrible mess. Again this is a big refactor that might cause regression on "obscure" configurations.
* vo_opengl: fix row-major vs. column-major confusionwm42016-03-281-2/+5
| | | | | | | | | gl_transform_vec() assumed column-major, while everything else seemed to assumed row-major memory organization for gl_transform.m. Also, gl_transform_trans() seems to contain additional confusion. This didn't matter until now, as everything has been orthogonal, this the swapped matrix entries were always 0.
* vo_opengl: refactor pass_read_video and texture bindingNiklas Haas2016-03-051-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a pretty major rewrite of the internal texture binding mechanic, which makes it more flexible. In general, the difference between the old and current approaches is that now, all texture description is held in a struct img_tex and only explicitly bound with pass_bind. (Once bound, a texture unit is assumed to be set in stone and no longer tied to the img_tex) This approach makes the code inside pass_read_video significantly more flexible and cuts down on the number of weird special cases and spaghetti logic. It also has some improvements, e.g. cutting down greatly on the number of unnecessary conversion passes inside pass_read_video (which was previously mostly done to cope with the fact that the alternative would have resulted in a combinatorial explosion of code complexity). Some other notable changes (and potential improvements): - texture expansion is now *always* handled in pass_read_video, and the colormatrix never does this anymore. (Which means the code could probably be removed from the colormatrix generation logic, modulo some other VOs) - struct fbo_tex now stores both its "physical" and "logical" (configured) size, which cuts down on the amount of width/height baggage on some function calls - vo_opengl can now technically support textures with different bit depths (e.g. 10 bit luma, 8 bit chroma) - but the APIs it queries inside img_format.c doesn't export this (nor does ffmpeg support it, really) so the status quo of using the same tex_mul for all planes is kept. - dumb_mode is now only needed because of the indirect_fbo being in the main rendering pipeline. If we reintroduce p->use_indirect and thread a transform through the entire program this could be skipped where unnecessary, allowing for the removal of dumb_mode. But I'm not sure how to do this in a clean way. (Which is part of why it got introduced to begin with) - It would be trivial to resurrect source-shader now (it would just be one extra 'if' inside pass_read_video).
* vo_opengl: support 10 bit support with ANGLEwm42016-01-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | GLES does not support high bit depth fixed point textures for unknown reasons, so direct 10 bit input is not possible. But we can still use integer textures, which are supported by GLES 3.0. These store integer data just like the standard fixed point textures, except they are not normalized on sampling. They also don't support bilinear filtering, and require a special sampler ("usampler2D"). While these texture formats enable us to shuffle the data to the GPU, they're rather impractical with the requirements mentioned above and our current architecture. One problem is that most code assumes it can always use bilinear scaling (even if bilinear is never used when using appropriate scale/cscale options). Another is that we don't have any concept of running a function on a texture in an uniform way. So for now, run a simple conversion step through a FBO. The FBO will use the rgba16f format normally, which gives enough bits for 10 bit, and will at least gracefully degrade with higher depth input. This is bound to be much slower than a more "direct" method, but at least it works and is simple to implement. The odd change of function call order in init_video() is to properly disable "dumb mode" (no FBO use) if these texture formats are in use.
* Change GPL/LGPL dual-licensed files to LGPLwm42016-01-191-12/+7
| | | | | | | | | | | Do this to make the license situation less confusing. This change should be of no consequence, since LGPL is compatible with GPL anyway, and making it LGPL-only does not restrict the use with GPL code. Additionally, the wording implies that this is allowed, and that we can just remove the GPL part.
* vo_opengl: handle GL_ARB_uniform_buffer_object with low GLSL versionswm42015-11-091-0/+1
| | | | Why is this stupid crap being so much a pain for no reason.
* vo_opengl: implement NNEDI3 prescalerBin Jin2015-11-051-0/