summaryrefslogtreecommitdiffstats
path: root/video/out/opengl/utils.c
Commit message (Collapse)AuthorAgeFilesLines
* vo_opengl: fix image uniforms for older OpenGLJames Ross-Gowan2017-07-261-0/+2
| | | | | This explicitly enables the GL_ARB_shader_image_load_store extension, which seems to fix compute shaders for Intel/GL 3.0.
* vo_opengl: kill off FBOTEX_COMPUTE againNiklas Haas2017-07-251-20/+15
| | | | | | | | | | | The textures not having an FBO actually caused regressions when trying to render the subtitles on top of this texture (--blend-subtitles), which still relied on an FBO. So just kill off the logic entirely. Why worry about a single FBO wasted when we're allocating like 10 anyway. Fixes #4657.
* vo_opengl: enable compute shader for mesaBin Jin2017-07-251-0/+7
| | | | | | | | | Mesa 17.1 supports compute shader but not full specs of OpenGL 4.3. Change the code to detect OpenGL extension "GL_ARB_compute_shader" rather than OpenGL version 4.3. HDR peak detection requires SSBO, and polar scaler requires 2D array extension. Add these extensions as requirement as well.
* vo_opengl: support HDR peak detectionNiklas Haas2017-07-241-0/+60
| | | | | | | | | | | | | | This is done via compute shaders. As a consequence, the tone mapping algorithms had to be rewritten to compute their known constants in GLSL (ahead of time), instead of doing it once. Didn't affect performance. Using shmem/SSBO atomics in this way is extremely fast on nvidia, but it might be slow on other platforms. Needs testing. Unfortunately, setting up the SSBO still requires OpenGL calls, which means I can't have it in video_shaders.c, where it belongs. But I'll defer worrying about that until the backend refactor, since then I'll be breaking up the video/video_shaders structure anyway.
* vo_opengl: support compute shadersNiklas Haas2017-07-241-94/+192
| | | | | | | | These can either be invoked as dispatch_compute to do a single computation, or finish_pass_fbo (after setting compute_size_minimum) to render to a new texture using a compute shader. To make this stuff all work transparently, we try really, really hard to make compute shaders as identical to fragment shaders as possible in their behavior.
* common, vo_opengl: add/use helper for formatted strings on the stackwm42017-07-241-2/+1
| | | | | | | | | | | Seems like I really like this C99 idiom. No reason not to generalize it do snprintf(). Introduce mp_tprintf(), which basically this idiom to snprintf(). This macro looks like it returns a string that was allocated with alloca() on the caller site, except it's portable C99/C11. (And unlike alloca(), the result is valid only within block scope.) Use it in 2 places in the vo_opengl code. But it has the potential to make a whole bunch of weird looking code look slightly nicer.
* vo_opengl: make VAO helper private, remove old VAO mechanismwm42017-07-221-28/+17
| | | | The struct describing vertex attributes is still public, of course.
* vo_opengl: add mechanism to create/cache VAO on the flywm42017-07-221-9/+56
| | | | | | This removes VAO handling from video.c. Instead the shader cache will create the VAO as needed. The consequence is that this creates a VAO per shader, which might be a bit wasteful, but doesn't matter anyway.
* vo_opengl: osd: refactor and simplifywm42017-07-221-13/+0
| | | | | | | | Reduce this to 1 draw call per OSD pass. This removes the need for some annoying special handling regarding 3D video support (we supported duplicating the OSD/subtitles for side-by-side 3D output etc.). Remove the unneeded texture sampler uniform thing.
* vo_opengl: styleNiklas Haas2017-07-161-1/+1
| | | | Use uintptr_t instead of size_t. Shouldn't matter, but is cleaner.
* vo_opengl: use MP_ALIGN_UP instead of FFALIGNNiklas Haas2017-07-161-1/+1
| | | | Consistency/style
* vo_opengl: use glBufferSubData instead of glMapBufferRangeNiklas Haas2017-07-161-24/+8
| | | | | | | | | | | | | | | | | | | | Performance seems pretty much unchanged but I no longer get nasty spikes on NUMA systems, probably because glBufferSubData runs in the driver or something. As a simplification of the code, we also just size the PBO to always have the full size, even for cropped textures. This seems slower but not by relevant amounts, and only affects e.g. --vf=crop. It also slightly increases VRAM usage for textures with big strides. This new code path is especially nice because it no longer depends on GL_ARB_map_buffer_range, and no longer uses any functions that can possibly fail, thus simplifying control flow and seemingly deprecating the manpage's claim about possible image corruption. In theory we could also reduce NUM_PBO_BUFFERS since it doesn't seem like we're streaming uploads anyway, but leave it in there just in case some drivers disagree...
* vo_opengl: update BufferData usage hintsNiklas Haas2017-07-151-2/+11
| | | | | | | | | | | | | | | STREAM is better than DYNAMIC because we're only using it once per frame. As for COPY vs DRAW, that was pretty much incorrect to begin with - but surprisngly, COPY is actually faster (sometimes significantly so, e.g. on my NUMA system). After testing, the best I can gather is that it has to do with the fact that COPY requires fewer redundant memcpy()s, and also 3x reduce RAM bandwidth (in theory). Anyway, that bit shouldn't introduce any regressions, it's just a documentation update. Maybe I'll change my mind about the comment again the future, it's really hard to tell. Vulkan, please save us!
* vo_opengl: coalesce intra-plane PBOsNiklas Haas2017-07-151-13/+16
| | | | | | | | | | | | | | Instead of allocating three PBOs and cycling through them, we allocate one PBO that's three times as large, and cycle through the subregion offsets. This results in arguably simpler code and faster initialization performance. Especially for 4K textures, initializing PBOs can take quite some time (e.g. 180ms -> 110ms). For 1080p, it's more like 66ms -> 52ms for me. The alignment to 4096 is completely unnecessary by spec, but we do it anyway just for peace of mind.
* vo_opengl: refactor vo performance subsystemNiklas Haas2017-07-011-29/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces `vo-performance` by `vo-passes`, bringing with it a number of changes and improvements: 1. mpv users can now introspect the vo_opengl passes, which is something that has been requested multiple times. 2. performance data is now measured per-pass, which helps both development and debugging. 3. since adding more passes is cheap, we can now report information for more passes (e.g. the blit pass, and the osd pass). Note: we also switch to nanosecond scale, to be able to measure these passes better. 4. `--user-shaders` authors can now describe their own passes, helping users both identify which user shaders are active at any given time as well as helping shader authors identify performance issues. 5. the timing data per pass is now exported as a full list of samples, so projects like Argon-/mpv-stats can immediately read out all of the samples and render a graph without having to manually poll this option constantly. Due to gl_timer's design being complicated (directly reading performance data would block, so we delay the actual read-back until the next _start command), it's vital not to conflate different passes that might be doing different things from one frame to another. To accomplish this, the actual timers are stored as part of the gl_shader_cache's sc_entry, which makes them unique for that exact shader. Starting and stopping the time measurement is easy to unify with the gl_sc architecture, because the existing API already relies on a "generate, render, reset" flow, so we can just put timer_start and timer_stop in sc_generate and sc_reset, respectively. The ugliest thing about this code is that due to the need to keep pass information relatively stable in between frames, we need to distinguish between "new" and "redrawn" frames, which bloats the code somewhat and also feels hacky and vo_opengl-specific. (But then again, this entire thing is vo_opengl-specific)
* vo_opengl: add option for caching shaders on diskwm42017-04-081-11/+122
| | | | | | | | | | | | | Mostly because of ANGLE (sadly). The implementation became unpleasantly big, but at least it's relatively self-contained. I'm not sure to what degree shaders from different drivers are compatible as in whether a driver would randomly misbehave if it's fed a binary created by another driver. The useless binayFormat parameter won't help it, as they can probably easily clash. As usual, OpenGL is pretty shit here.
* vo_opengl: remove two unused symbolswm42017-04-081-1/+0
|
* vo_opengl: use 16 bit textures with anglewm42017-04-031-1/+5
| | | | | | Regression due to 03fe506. It accidentally changed the default value if glGetTexLevelParameteriv() is not available, which is the case with ANGLE.
* vo_opengl: read framebuffer depth from actual FBO used for renderingwm42017-03-201-3/+3
| | | | | | | | | | | | In some cases, such as when using the libmpv opengl-cb API, or with certain vo_opengl backends, the main framebuffer is never accessed. Instead, rendering is done to a FBO that acts as back buffer. This meant an incorrect/broken bit depth could be used for dithering. Change it to read the framebuffer depth lazily on the first render call. Also move the main FBO field out of the GL struct to MPGLContext, because the renderer's init function does not need to access it anymore.
* vo_opengl: move some init_gl code to utility functionswm42017-03-201-0/+46
|
* vo_opengl: fix some undefined behaviorwm42017-03-181-1/+1
| | | | | | | The gl_timer_last_us() function could access samples[-1]. Fix by coercing to unsigned, so the % will put it into index [0,max). The real value returned in this corner case doesn't mean too much, I guess.
* Fix two typoswm42017-02-201-1/+1
| | | | They're unrelated. Sue me.
* vo_opengl: don't rely on viewport to contain window dimensionswm42016-12-021-7/+4
| | | | | | | | | | | | | Apparently we don't always set the viewport to window dimensions anymore, e.g. if nothing is actually rendered. This means the viewport can contain old values. The window screenshot code uses the viewport values to guess the default framebuffer dimensions. With --force-window --idle --no-osc (which draws nothing and issues a glClear() command only), taking a screenshot would yield an image with the wrong size and possibly garbage in it. Fix this by explicitly passing the currently known window dimensions. Abusing the values stored in the viewport was questionable anyway.
* vo_opengl: fix OSD with icc-profile after previous commitwm42016-09-141-0/+2
| | | | | | This happened to break because the texture unit wasn't reset to 0, which some code expects. The OSD code in particular set the OSD texture on the wrong texture unit, with the result that OSD/OSC was not visible.
* vo_opengl: dynamically manage texture unitswm42016-09-141-3/+40
| | | | | | | | | | | | | | | | | | | | | A minor cleanup that makes the code simpler, and guarantees that we cleanup the GL state properly at any point. We do this by reusing the uniform caching, and assigning each sampler uniform its own texture unit by incrementing a counter. This has various subtle consequences for the GL driver, which hopefully don't matter. For example, it will bind fewer textures at a time, but also rebind them more often. For some reason we keep TEXUNIT_VIDEO_NUM, because it limits the number of hook passes that can be bound at the same time. OSD rendering is an exception: we do many passes with the same shader, and rebinding the texture each pass. For now, this is handled in an unclean way, and we make the shader cache reserve texture unit 0 for the OSD texture. At a later point, we should allocate that one dynamically too, and just pass the texture unit to the OSD rendering code. Right now I feel like vo_rpi.c (may it rot in hell) is in the way.
* vo_opengl: require explicit reset on shader cache after renderingwm42016-09-141-4/+22
| | | | | | | | | The caller now has to call gl_sc_reset(), and _after_ rendering. This way we can unset OpenGL state that was setup for rendering. This affects the shader program, for example. The next commit uses this to automatically manage texture units via the shader cache. vo_rpi.c changes untested.
* vo_opengl: make the number of PBOs tunableNiklas Haas2016-09-141-4/+4
| | | | | Also set the number of PBOs from 2 to 3, which should be better for pipelining. This makes it easier to add more in the future.
* vo_opengl: better behavior in GL error corner caseswm42016-09-121-4/+5
| | | | | | | | | | | If the shader fails to compile, and assertion could trigger in gl_sc_gen_shader_and_reset() due to the code trying to recreate the shader every time, and re-appending the uniforms every time. Just reset the uniform array to fix this. Some disturbed GL drivers might not return anything for glGetShaderiv() if the GL state got "lost", so initialize variables just for additional robustness.
* vo_opengl: use ringbuffer of PBOswm42016-07-031-6/+11
| | | | | | | | This is how PBOs are normally supposed to be used. Unfortunately I can't see an any absolute improvement on nVidia binary drivers and playing 4K material. Compared to the "old" PBO path with 1 buffer, the measured GL time decreases significantly, though.
* vo_opengl: move PBO upload handling to shared codewm42016-07-031-0/+65
| | | | | | | | | | | | | This introduces a gl_pbo_upload_tex() function, which works almost like our gl_upload_tex() glTexSubImage2D() wrapper, except it takes a struct which caches the PBO handles. It also takes the full texture size (to make allocating an ideal buffer size easier), and a parameter to disable PBOs (so that the caller doesn't have to duplicate the gl_upload_tex() call if PBOs are disabled or unavailable). This also removes warnings and fallbacks on PBO failure. We just silently try using PBOs on every frame, and if that fails at some point, revert to normal texture uploads. Probably doesn't matter.
* vo_opengl utils: use gl->main_fb when reading window contentquilloss2016-06-261-1/+4
| | | | | | | | The main framebuffer is not the default framebuffer for the dxinterop backend. Bind the main framebuffer and use the appropriate attachment when reading the window content. Fix #3284
* vo_opengl: remove uniform buffer object routinesBin Jin2016-06-181-28/+2
|
* vo_opengl: do not leak previous FBO when reallocating itwm42016-06-081-0/+2
| | | | WTF of the day.
* vo_opengl: add time queriesNiklas Haas2016-06-071-0/+126
| | | | | | | | | | | | | | | | | | To avoid blocking the CPU, we use 8 time objects and rotate through them, only blocking until the last possible moment (before we need access to them on the next iteration through the ring buffer). I tested it out on my machine and 4 query objects were enough to guarantee block-free querying, but the extra margin shouldn't hurt. Frame render times are just output at the end of each frame, via MP_DBG. This might be improved in the future. (In particular, I want to expose these numbers as properties so that users get some more visible feedback about render times) Currently, we measure pass_render_frame and pass_draw_to_screen separately because the former might be called multiple times due to interpolation. Doing it this way gives more faithful numbers. Same goes for frame upload times.
* vo_opengl: enable color management on GLESJames Ross-Gowan2016-05-271-1/+5
| | | | | | This requires the GL_EXT_texture_norm16 extension and works in ANGLE. A default precision had to be set for sampler3Ds, otherwise the shaders would fail to compile.
* vo_opengl: fix other minor namespace issueswm42016-05-231-4/+4
| | | | See previous commit.
* vo_opengl: rename glUploadTex, drop unused parameterwm42016-05-231-9/+6
| | | | | | | | Rename it to get out of OpenGL's namespace. The gl_ prefix is used by other mpv functions, but no OpenGL ones. The "slice" parameter was never actually used, and all callers passed 0 for it.
* vo_opengl: unify PBO and normal OSD texture upload pathwm42016-05-231-25/+0
| | | | | | | | | | | | | The main change is actually that e first copy to a "staging" memory frame, and then upload this at once. The old non-PBO code called glTexsubImage2D for each OSD sub-bitmap. The new non-PBO code path is a bit faster now if there are many small sub-bitmaps (on Linux/nVidia). It's also a bit simpler, so this is a win. (Although I don't particularly appreciate the mixed normal/PBO texture code.)
* vo_opengl: support framebuffer invalidationwm42016-05-231-0/+15
| | | | | | | | | | Not sure how much can be gained with this, as we can't use it properly yet. For now, this is used only before rendering, which probably does overwhelmingly nothing. In the future, this should be used after temporary passes, which could possibly reduce memory usage and even memory bandwidth usage, depending on the drivers.
* vo_opengl: make gl_sc_enable_extension() permanent/idempotentwm42016-05-191-2/+12
| | | | | | | | No reason not to, and makes the following commit slightly simpler. In fact, this makes the shaders more correct too. Normally, "#extension" must come before any normal shader text, including the "precision" directive. Not sure why this worked before. (Probably didn't.)
* vo_opengl: move UT_buffer to switch handlingwm42016-05-171-5/+5
| | | | No reason to make it a special case.
* vo_opengl: make number of cached shaders/uniform dynamicwm42016-05-171-12/+24
| | | | | | | | | | Use dynamic memory allocation, as the static allocation is starting to get annoying. Currently, SC_MAX_ENTRIES is essentially still a static upper limit on the number of shaders. But in future we could try a more clever cache replacement strategy, which does not keep stale entries forever if the maximum happens not to be reached.
* vo_opengl: move cached uniforms to a separate structwm42016-05-171-10/+15
|
* vo_opengl: increase shader limitsNiklas Haas2016-05-171-2/+2
| | | | | | | | | | | | | The new uniforms introduced by 362015c have exceeded the uniform limit when using high-radius tscale. In addition, the SC limit of 32 entries might be pushing it with user shaders. Just make these value a bigger to delay the onset of this same failure mode. Maybe in the future it should be reworked to grow dynamically? Either way, we *can* always predict a static upper bound on the number of uniforms and shader cache entries, it's just that we forgot to do so. Fixes #3151
* vo_opengl: make the screen blue on shader errorsNiklas Haas2016-05-151-0/+18
| | | | | This helps visually signify that somthing went wrong, and prevents confusing shader compilation errors with other types of bugs.
* vo_opengl: support external user hooksNiklas Haas2016-05-151-0/+5
| | | | | | | | | | | This allows users to add their own near-arbitrary hooks to the vo_opengl processing pipeline, greatly enhancing the flexibility of user shaders. This enables, among other things, user shaders such as CrossBilateral, SuperRes, LumaSharpen and many more. To make parsing the user shaders easier, shaders are now loaded as bstrs, and the hooks are set up during video reconfig instead of on every single frame.
* vo_opengl: remove some pointless compatibilitywm42016-05-141-1/+0
| | | | | | Remove non-texture_rg compatibility from LUT sampling. OpenGL without texture_rg support will always trigger dumb-mode, and dumb-mode does not use LUTs. It used not to, and that was when this made sense.
* vo_opengl: reorganize texture format handlingwm42016-05-121-96/+17
| | | | | | | | | | | | | This merges all knowledge about texture format into a central table. Most of the work done here is actually identifying which formats exactly are supported by OpenGL(ES) under which circumstances, and keeping this information in the format table in a somewhat declarative way. (Although only to the extend needed by mpv.) In particular, ES and float formats are a horrible mess. Again this is a big refactor that might cause regression on "obscure" configurations.
* vo_opengl: angle: dump translated shaderswm42016-05-121-0/+10
| | | | Helpful for debugging and such.
* vo_opengl: d3d11egl: native NV12 sampling supportwm42016-05-101-0/+1
| | | | | | | | | | | | | | | | | This uses EGL_ANGLE_stream_producer_d3d_texture_nv12 and related extensions to map the D3D textures coming from the hardware decoder directly in GL. In theory this would be trivial to achieve, but unfortunately ANGLE does not have a mechanism to "import" D3D textures as GL textures. Instead, an awkward mechanism via EGL_KHR_stream was implemented, which involves at least 5 extensions and a lot of glue code. (Even worse than VAAPI EGL interop, and very far from the simplicity you get on OSX.) The ANGLE mechanism so far supports only the NV12 texture format, which means 10 bit won't work. It also does not work in ES3 mode yet. For these reasons, the "old" ID3D11VideoProcessor code is kept and used as a fallback.
* vo_opengl: slightly compress gl_set_debug_logger()wm42016-03-281-7/+2
| | | | No functional changes.
* vo_opengl: reduce temporary variables in gl_transform_trans()wm42016-03-281-5/+5
| | | | | Using a single gl_transform variable instead of many float ones makes it easier to see what it's doing. No functional change.
* vo_opengl: fix row-major vs. column-major confusionwm42016-03-281-2/+2
| | | | | | | | | gl_transform_vec() assumed column-major, while everything else seemed to assumed row-major memory organization for gl_transform.m. Also, gl_transform_trans() seems to contain additional confusion. This didn't matter until now, as everything has been orthogonal, this the swapped matrix entries were always 0.
* vo_opengl: minor coding style adjustmentwm42016-03-241-3/+4
|
* vo_opengl: utils: some more minor shader string building optimizationwm42016-03-241-23/