| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
| |
Fixes https://github.com/libass/libass/issues/475
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change list:
- Fixed differences from C version introduced
in f23b9ed64bd4ccf249c686616dd3f51a69d285dc.
- Common macro for SSE2 and AVX2 versions.
- Reduced register usage and efficient 32-bit version.
- Full width memory operations instead of half-register.
- Vectorized handling of width tails instead of byte/word loops.
- Vectorized initial population of temporary buffer and final line fill.
- Interleaved layout of temporary buffer.
- Great speedup overall.
|
|
|
|
|
|
|
|
|
|
| |
Change list:
- No special handling of unaligned case.
- Common macro for SSE2 and AVX2 versions,
AVX2 has got significantly faster.
- Reduced register usage and efficient 32-bit version.
- Full width memory operations instead of half-register.
- Vectorized handling of width tails instead of byte loops.
|
|
|
|
|
|
|
|
|
|
| |
Change list:
- No special handling of unaligned case (removed in previous commit).
- Common macro for both add_bitmaps() and sub_bitmaps().
- Reduced register usage and efficient 32-bit version.
- add_bitmaps() no longer rely on zero padding.
- Vectorized handling of width tails (instead of byte loop
in sub_bitmaps(), great speedup for non-empty tails).
|
|
|
|
| |
This should fix the warnings introduced with nasm 2.15
|
|
|
|
|
| |
This commit removes prefilters altogether at the cost of
enlarged main filter kernel.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
That's complete version with SSE2/AVX2 assembly.
Should be much faster than old algorithm even in pure C.
Algorithm description can be found in this article (PDF):
https://github.com/MrSmile/CascadeBlur/releases
Close #9
|
|
|
|
|
|
|
|
|
|
| |
Use one pointer to table of functions instead of scattered
bunch of function pointers. Different versions of these tables
can be constructed in compile time.
Also, bitmap memory alignment now depends only on SSE2/AVX2 support
and is constant for every width. That simplifies code without
noticeable performance penalty.
|
| |
|
| |
|
| |
|
|
|
|
| |
Signed-off-by: Rodger Combs <rodger.combs@gmail.com>
|
|
|
|
|
|
|
|
| |
A loop initializer was missing, so output started one row too early.
A loop condition check was missing, so output sometimes stopped
one column too late.
Also remove a couple of dead assignments.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|