| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
TODO: The SSSE3 implementation is marked as SSE2; I'll need to fix this
(and possibly provide an actual SSE2 variant using pmulhw instead of pmulhrsw)
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
That's complete version with SSE2/AVX2 assembly.
Should be much faster than old algorithm even in pure C.
Algorithm description can be found in this article (PDF):
https://github.com/MrSmile/CascadeBlur/releases
Close #9
|
|
Use one pointer to table of functions instead of scattered
bunch of function pointers. Different versions of these tables
can be constructed in compile time.
Also, bitmap memory alignment now depends only on SSE2/AVX2 support
and is constant for every width. That simplifies code without
noticeable performance penalty.
|