summaryrefslogtreecommitdiffstats
path: root/libass/ass_func_template.h
Commit message (Collapse)AuthorAgeFilesLines
* Merge ass_func_template.h into ass_bitmap_engine.cDr.Smile2022-12-041-77/+0
|
* bitmap_engine: reduce copy-paste for prototypesDr.Smile2022-12-041-68/+20
|
* rasterizer: use max-blending for outline pair mergeDr.Smile2021-09-261-0/+4
| | | | | | | | | Max-blending is more correct than previous addition-blending in case of small (less than pixel) outline offsets. Fixes buffer overrun (up to 16 bytes read past rst->tile) in add_bitmaps() in case of engine->tile_order < engine->align_order (AVX2 assembly with LARGE_TILES disabled) due to insufficient padding.
* renderer: use complementary equations for \clip and \iclipDr.Smile2021-09-141-4/+4
| | | | | | | | | | | | Basic symmetry considerations require that clip_func(src, clip) = iclip_func(src, 255 - clip), but we use unrelated expressions for them: clip_func(src, clip) = (src * clip + 255) / 256 != iclip_func(src, iclip) = max(src - iclip, 0) = max(src + clip - 255, 0). Version with multiplication is more correct (albeit slower) as it gives results closer to the infinite resolution reference (per compare -s 8) in case of half-transparency. So I've picked better function (clip variant) and derived another from it.
* Make argument order uniform between bitmap functionsDr.Smile2021-04-211-4/+4
|
* Rewrite be_blur() assemblyDr.Smile2021-04-211-4/+0
| | | | | | | | | | | | | Change list: - Fixed differences from C version introduced in f23b9ed64bd4ccf249c686616dd3f51a69d285dc. - Common macro for SSE2 and AVX2 versions. - Reduced register usage and efficient 32-bit version. - Full width memory operations instead of half-register. - Vectorized handling of width tails instead of byte/word loops. - Vectorized initial population of temporary buffer and final line fill. - Interleaved layout of temporary buffer. - Great speedup overall.
* Rewrite mul_bitmaps() assemblyDr.Smile2021-04-211-4/+0
| | | | | | | | | | Change list: - No special handling of unaligned case. - Common macro for SSE2 and AVX2 versions, AVX2 has got significantly faster. - Reduced register usage and efficient 32-bit version. - Full width memory operations instead of half-register. - Vectorized handling of width tails instead of byte loops.
* Rewrite add/sub_bitmaps() assemblyDr.Smile2021-04-211-2/+1
| | | | | | | | | | Change list: - No special handling of unaligned case (removed in previous commit). - Common macro for both add_bitmaps() and sub_bitmaps(). - Reduced register usage and efficient 32-bit version. - add_bitmaps() no longer rely on zero padding. - Vectorized handling of width tails (instead of byte loop in sub_bitmaps(), great speedup for non-empty tails).
* Simplify blur algorithmDr.Smile2020-10-091-34/+32
| | | | | This commit removes prefilters altogether at the cost of enlarged main filter kernel.
* renderer: remove legacy FreeType rasterizerDr.Smile2017-07-311-2/+0
|
* Implement cascade gaussian blurDr.Smile2015-07-041-0/+54
| | | | | | | | | | That's complete version with SSE2/AVX2 assembly. Should be much faster than old algorithm even in pure C. Algorithm description can be found in this article (PDF): https://github.com/MrSmile/CascadeBlur/releases Close #9
* Switch to virtual function tableDr.Smile2015-06-261-0/+80
Use one pointer to table of functions instead of scattered bunch of function pointers. Different versions of these tables can be constructed in compile time. Also, bitmap memory alignment now depends only on SSE2/AVX2 support and is constant for every width. That simplifies code without noticeable performance penalty.