video: generally try to align image data on 64 bytes

Generally, using x86 SIMD efficiently (or crash-free) requires aligning all data on boundaries of 16, 32, or 64 (depending on instruction set used). 64 bytes is needed or AVX-512, 32 for old AVX, 16 for SSE. Both FFmpeg and zimg usually require aligned data for this reason. FFmpeg is very unclear about alignment. Yes, it requires you to align data pointers and strides. No, it doesn't tell you how much, except sometimes (libavcodec has a legacy-looking avcodec_align_dimensions2() API function, that requires a heavy-weight AVCodecContext as argument). Sometimes, FFmpeg will take a shit on YOUR and ITS OWN alignment. For example, vf_crop will randomly reduce alignment of data pointers, depending on the crop parameters. On the other hand, some libavfilter filters or libavcodec encoders may randomly crash if they get the wrong alignment. I have no idea how this thing works at all. FFmpeg usually doesn't seem to signal alignment internal anywhere, and usually leaves it to av_malloc() etc. to allocate with proper alignment. libavutil/mem.c currently has a ALIGN define, which is set to 64 if FFmpeg is built with AVX-512 support, or as low as 16 if built without any AVX support. The really funny thing is that a normal FFmpeg build will e.g. align tiny string allocations to 64 bytes, even if the machine does not support AVX at all. For zimg use (in a later commit), we also want guaranteed alignment. Modern x86 should actually not be much slower at unaligned accesses, but that doesn't help. zimg's dumb intrinsic code apparently randomly chooses between aligned or unaligned accesses (depending on compiler, I guess), and on some CPUs these can even cause crashes. So just treat the requirement to align as a fact of life. All this means that we should probably make sure our own allocations are 64 bit aligned. This still doesn't guarantee alignment in all cases, but it's slightly better than before. This also makes me wonder whether we should always override libavcodec's buffer pool, just so we have a guaranteed alignment. Currently, we only do that if --vd-lavc-dr is used (and if that actually works). On the other hand, it always uses DR on my machine, so who cares.
author: wm4 <wm4@nowhere> 2019-07-15 03:06:47 +0200
committer: wm4 <wm4@nowhere> 2019-09-19 20:37:05 +0200
commit: e1157cb6e8191f98b60813dc91342e3baf577d92 (patch)
tree: 4e60d1bc521848f1249856b15edd327370d21a91
parent: 36dd2348a1e118b44fcccd3c664af7c5f960b6bb (diff)
download: mpv-e1157cb6e8191f98b60813dc91342e3baf577d92.tar.bz2
mpv-e1157cb6e8191f98b60813dc91342e3baf577d92.tar.xz
5 files changed, 10 insertions, 4 deletions
diff --git a/video/decode/vd_lavc.c b/video/decode/vd_lavc.c
index e07eecc2d2..6c356e2e01 100644
--- a/video/decode/vd_lavc.c
+++ b/video/decode/vd_lavc.c
@@ -860,7 +860,7 @@ static int get_buffer2_direct(AVCodecContext *avctx, AVFrame *pic, int flags)
 
     // We assume that different alignments are just different power-of-2s.
     // Thus, a higher alignment always satisfies a lower alignment.
-    int stride_align = 0;
+    int stride_align = MP_IMAGE_BYTE_ALIGN;
     for (int n = 0; n < AV_NUM_DATA_POINTERS; n++)
         stride_align = MPMAX(stride_align, linesize_align[n]);
 
diff --git a/video/img_format.c b/video/img_format.c
index f7790aa290..be81e15b41 100644
--- a/video/img_format.c
+++ b/video/img_format.c
@@ -539,7 +539,7 @@ int main(int argc, char **argv)
             fr->format = fmt;
             fr->width = 128;
             fr->height = 128;
-            int err = av_frame_get_buffer(fr, SWS_MIN_BYTE_ALIGN);
+            int err = av_frame_get_buffer(fr, MP_IMAGE_BYTE_ALIGN);
             assert(err >= 0);
             struct mp_image *mpi = mp_image_alloc(mpfmt, fr->width, fr->height);
             assert(mpi);
diff --git a/video/mp_image.c b/video/mp_image.c
index f846b0d3d3..74ca80db21 100644
--- a/video/mp_image.c
+++ b/video/mp_image.c
@@ -174,7 +174,7 @@ static bool mp_image_alloc_planes(struct mp_image *mpi)
     assert(!mpi->planes[0]);
     assert(!mpi->bufs[0]);
 
-    int align = SWS_MIN_BYTE_ALIGN;
+    int align = MP_IMAGE_BYTE_ALIGN;
 
     int size = mp_image_get_alloc_size(mpi->imgfmt, mpi->w, mpi->h, align);
     if (size < 0)
diff --git a/video/mp_image.h b/video/mp_image.h
index 4727ed6f8d..0eed3eb43c 100644
--- a/video/mp_image.h
+++ b/video/mp_image.h
@@ -28,6 +28,12 @@
 #include "csputils.h"
 #include "video/img_format.h"
 
+// Assumed minimum align needed for image allocation. It's notable that FFmpeg's
+// libraries except libavcodec don't really know what alignment they want.
+// Things will randomly crash or get slower if the alignment is not satisfied.
+// Whatever. This value should be pretty safe with current CPU architectures.
+#define MP_IMAGE_BYTE_ALIGN 64
+
 #define MP_IMGFIELD_TOP_FIRST 0x02
 #define MP_IMGFIELD_REPEAT_FIRST 0x04
 #define MP_IMGFIELD_INTERLACED 0x20
diff --git a/video/sws_utils.h b/video/sws_utils.h
index 41472b5ecf..629e6be2dd 100644
--- a/video/sws_utils.h
+++ b/video/sws_utils.h
@@ -11,7 +11,7 @@ struct mpv_global;
 // libswscale currently requires 16 bytes alignment for row pointers and
 // strides. Otherwise, it will print warnings and use slow codepaths.
 // Guaranteed to be a power of 2 and > 1.
-#define SWS_MIN_BYTE_ALIGN 16
+#define SWS_MIN_BYTE_ALIGN MP_IMAGE_BYTE_ALIGN
 
 extern const int mp_sws_hq_flags;
 extern const int mp_sws_fast_flags;
author	wm4 <wm4@nowhere>	2019-07-15 03:06:47 +0200
committer	wm4 <wm4@nowhere>	2019-09-19 20:37:05 +0200
commit	e1157cb6e8191f98b60813dc91342e3baf577d92 (patch)
tree	4e60d1bc521848f1249856b15edd327370d21a91
parent	36dd2348a1e118b44fcccd3c664af7c5f960b6bb (diff)
download	mpv-e1157cb6e8191f98b60813dc91342e3baf577d92.tar.bz2 mpv-e1157cb6e8191f98b60813dc91342e3baf577d92.tar.xz