summaryrefslogtreecommitdiffstats
path: root/video/out/gpu/error_diffusion.c
Commit message (Collapse)AuthorAgeFilesLines
* various: replace dead links in commentsnanahi2024-02-111-1/+2
| | | | | Replace various dead links with live replacements or archives. Less friction for anyone who wants to look up these references.
* vo_gpu: opengl: some fixes to make compute shaders work with GLESPhilip Langdale2021-12-121-2/+2
| | | | | | | | | It's supposed to work with GLES >= 3.1 but we had all sorts of bad assumptions in our version handling, and then our compute shaders turn out not to be GLSL-ES compliant. This change contains some necessary, but insufficient, tweaks to the shaders. Perhaps we'll make it actually work some day.
* vo_gpu: process three component together in error diffusionBin Jin2019-06-161-42/+70
| | | | | | | | | | | | | | | | This started as a desperate attempt to lower the memory requirement of error diffusion, but later it turns out that this change also improved the rendering performance a lot (by 40% as I tested). Errors was stored in three uint before this change, each with 24bit precision. This change encoded them into a single uint, each with 8bit precision. This reduced the shared memory usage, as well as number of atomic operations, all by three times. Before this change, with the minimum required 32kb shared memory, only the `simple` kernel can be used to render 1080p video, which is mostly useless compare to `--dither=fruit`. After this change, 32kb can handle `burkes` kernel for 1080p, or `sierra-lite` for 4K resolution.
* vo_gpu: implement error diffusion for ditheringBin Jin2019-06-161-0/+288
This is a straightforward parallel implementation of error diffusion algorithms in compute shader. Basically we use single work group with maximal possible size to process the whole image. After a shift mapping we are able to process all pixels column by column. A large ring buffer are allocated in shared memory to speed things up. However the size of required shared memory depends linearly on the height of video window (or screen height in fullscreen mode). In case there is no enough shared memory, it will fallback to `--dither=fruit`. The maximal allowed work group size is hardcoded as 1024. Ideally we could query `GL_MAX_COMPUTE_WORK_GROUP_INVOCATIONS`. But for whatever reason, it seems most high end card from nvidia and amd support only the minimal required value, so I guess we can stick to it for now.