client API: fix potential deadlock problems by throwing more shit at it

The render API (vo_libmpv) had potential deadlock problems with MPV_RENDER_PARAM_ADVANCED_CONTROL. This required vd-lavc-dr to be enabled (the default). I never observed these deadlocks in the wild (doesn't mean they didn't happen), although I could specifically provoke them with some code changes. The problem was mostly about DR (direct rendering, letting the video decoder write to OpenGL buffer memory). Allocating/freeing a DR image needs to be done on the OpenGL thread, even though _lots_ of threads are involved with handling images. Freeing a DR image is a special case that can happen any time. dr_helper.c does most of the evil magic of achieving this. Unfortunately, there was a (sort of) circular lock dependency: freeing an image while certain internal locks are held would trigger the user's context update callback, which in turn would call mpv_render_context_update(), which processed all pending free requests, and then acquire an internal lock - which the caller might not release until a further DR image could be freed. "Solve" this by making freeing DR images asynchronous. This is slightly risky, but actually not much. The DR images will be free'd eventually. The biggest disadvantage is probably that debugging might get trickier. Any solution to this problem will probably add images to free to some sort of queue, and then process it later. I considered making this more explicit (so there'd be a point where the caller forcibly waits for all queued items to be free'd), but discarded these ideas as this probably would only increase complexity. Another consequence is that freeing DR images on the GL thread is not synchronous anymore. Instead, it mpv_render_context_update() will do it with a delay. This seems roundabout, but doesn't actually change anything, and avoids additional code. This also fixes that the render API required the render API user to remain on the same thread, even though this wasn't documented. As such, it was a bug. OpenGL essentially forces you to do all GL usage on a single thread, but in theory the API user could for example move the GL context to another thread. The API bump is because I think you can't make enough noise about this. Since we don't backport fixes to old versions, I'm specifically stating that old versions are broken, and I'm supplying workarounds. Internally, dr_helper_create() does not use pthread_self() anymore, thus the vo.c change. I think it's better to make binding to the current thread as explicit as possible. Of course it's not sure that this fixes all deadlocks (probably not).
author: wm4 <wm4@nowhere> 2019-09-26 14:07:33 +0200
committer: wm4 <wm4@nowhere> 2019-09-26 14:14:49 +0200
commit: 4d43c79e4c1b76d70bf0e976e6c72945ce962140 (patch)
tree: 052ebcba5f713ee10959a5fe51a700fa1c53dbf0
parent: 3ee6d7db4e2e61465d18db05a8346086ffa0884b (diff)
download: mpv-4d43c79e4c1b76d70bf0e976e6c72945ce962140.tar.bz2
mpv-4d43c79e4c1b76d70bf0e976e6c72945ce962140.tar.xz
7 files changed, 81 insertions, 32 deletions
diff --git a/DOCS/client-api-changes.rst b/DOCS/client-api-changes.rst
index 5383456c55..5b595047d0 100644
--- a/DOCS/client-api-changes.rst
+++ b/DOCS/client-api-changes.rst
@@ -32,6 +32,11 @@ API changes
 
 ::
  --- mpv 0.30.0 ---
+ 1.105  - Fix deadlock problems with MPV_RENDER_PARAM_ADVANCED_CONTROL and if
+          the "vd-lavc-dr" option is enabled (which it is by default).
+          There were no actual API changes.
+          API users on older API versions and mpv releases should set
+          "vd-lavc-dr" to "no" to avoid these issues.
  1.104  - Deprecate struct mpv_opengl_drm_params. Replaced by mpv_opengl_drm_params_v2
         - Deprecate MPV_RENDER_PARAM_DRM_DISPLAY. Replaced by MPV_RENDER_PARAM_DRM_DISPLAY_V2.
  1.103  - redo handling of async commands
diff --git a/libmpv/client.h b/libmpv/client.h
index 2fea40de13..670f45c626 100644
--- a/libmpv/client.h
+++ b/libmpv/client.h
@@ -223,7 +223,7 @@ extern "C" {
  * relational operators (<, >, <=, >=).
  */
 #define MPV_MAKE_VERSION(major, minor) (((major) << 16) | (minor) | 0UL)
-#define MPV_CLIENT_API_VERSION MPV_MAKE_VERSION(1, 104)
+#define MPV_CLIENT_API_VERSION MPV_MAKE_VERSION(1, 105)
 
 /**
  * The API user is allowed to "#define MPV_ENABLE_DEPRECATED 0" before
diff --git a/libmpv/render.h b/libmpv/render.h
index 6d594e0c06..afea7c2e10 100644
--- a/libmpv/render.h
+++ b/libmpv/render.h
@@ -79,16 +79,6 @@ extern "C" {
  *    is logged. If you set MPV_RENDER_PARAM_ADVANCED_CONTROL, you promise that
  *    this won't happen, and must absolutely guarantee it, or a real deadlock
  *    will freeze the mpv core thread forever.
- *  - if MPV_RENDER_PARAM_ADVANCED_CONTROL is used, you currently must call all
- *    mpv_render*() API functions from the same thread on which the
- *    mpv_render_context was returned by mpv_render_context_create(). This
- *    requirement always existed. Not honoring it will lead to UB (deadlocks,
- *    use of invalid pthread_t handles). This requirement might be removed in
- *    the future, but will require some considerable work internal to libmpv.
- *    You can avoid this issue by setting "vd-lavc-dr" to "no".
- *  - MPV_RENDER_PARAM_ADVANCED_CONTROL has some other libmpv-internal problems,
- *    which may result in random deadlocks (see top of vo_libmpv.c).
- *    You can probably avoid this issue by setting "vd-lavc-dr" to "no".
  *
  * libmpv functions which are safe to call from a render thread are:
  *  - functions marked with "Safe to be called from mpv render API threads."
@@ -104,6 +94,18 @@ extern "C" {
  *  - if the mpv_handle parameter refers to a different mpv core than the one
  *    you're rendering for (very obscure, but allowed)
  *
+ * Note about old libmpv version:
+ *
+ *      Before API version 1.105 (basically in mpv 0.29.x), simply enabling
+ *      MPV_RENDER_PARAM_ADVANCED_CONTROL could cause deadlock issues. This can
+ *      be worked around by setting the "vd-lavc-dr" option to "no".
+ *      In addition, you were required to call all mpv_render*() API functions
+ *      from the same thread on which mpv_render_context_create() was originally
+ *      run (for the same the mpv_render_context). Not honoring it led to UB
+ *      (deadlocks, use of invalid pthread_t handles), even if you moved your GL
+ *      context to a different thread correctly.
+ *      These problems were addressed in API version 1.105 (mpv 0.30.0).
+ *
  * Context and handle lifecycle
  * ----------------------------
  *
diff --git a/video/out/dr_helper.c b/video/out/dr_helper.c
index e826d08dc4..78d4633efb 100644
--- a/video/out/dr_helper.c
+++ b/video/out/dr_helper.c
@@ -12,7 +12,10 @@
 #include "dr_helper.h"
 
 struct dr_helper {
+    pthread_mutex_t thread_lock;
     pthread_t thread;
+    bool thread_valid; // (POSIX defines no "unset" pthread_t value yet)
+
     struct mp_dispatch_queue *dispatch;
     atomic_ullong dr_in_flight;
 
@@ -28,6 +31,8 @@ static void dr_helper_destroy(void *ptr)
     // All references must have been freed on destruction, or we'll have
     // dangling pointers.
     assert(atomic_load(&dr->dr_in_flight) == 0);
+
+    pthread_mutex_destroy(&dr->thread_lock);
 }
 
 struct dr_helper *dr_helper_create(struct mp_dispatch_queue *dispatch,
@@ -38,15 +43,34 @@ struct dr_helper *dr_helper_create(struct mp_dispatch_queue *dispatch,
     struct dr_helper *dr = talloc_ptrtype(NULL, dr);
     talloc_set_destructor(dr, dr_helper_destroy);
     *dr = (struct dr_helper){
-        .thread = pthread_self(),
         .dispatch = dispatch,
         .dr_in_flight = ATOMIC_VAR_INIT(0),
         .get_image = get_image,
         .get_image_ctx = get_image_ctx,
     };
+    pthread_mutex_init(&dr->thread_lock, NULL);
     return dr;
 }
 
+void dr_helper_acquire_thread(struct dr_helper *dr)
+{
+    pthread_mutex_lock(&dr->thread_lock);
+    assert(!dr->thread_valid); // fails on API user errors
+    dr->thread_valid = true;
+    dr->thread = pthread_self();
+    pthread_mutex_unlock(&dr->thread_lock);
+}
+
+void dr_helper_release_thread(struct dr_helper *dr)
+{
+    pthread_mutex_lock(&dr->thread_lock);
+    // Fails on API user errors.
+    assert(dr->thread_valid);
+    assert(pthread_equal(dr->thread, pthread_self()));
+    dr->thread_valid = false;
+    pthread_mutex_unlock(&dr->thread_lock);
+}
+
 struct free_dr_context {
     struct dr_helper *dr;
     AVBufferRef *ref;
@@ -66,13 +90,19 @@ static void dr_thread_free(void *ptr)
 static void free_dr_buffer_on_dr_thread(void *opaque, uint8_t *data)
 {
     struct free_dr_context *ctx = opaque;
+    struct dr_helper *dr = ctx->dr;
+
+    pthread_mutex_lock(&dr->thread_lock);
+    bool on_this_thread =
+        dr->thread_valid && pthread_equal(ctx->dr->thread, pthread_self());
+    pthread_mutex_unlock(&dr->thread_lock);
 
     // The image could be unreffed even on the DR thread. In practice, this
     // matters most on DR destruction.
-    if (pthread_equal(ctx->dr->thread, pthread_self())) {
+    if (on_this_thread) {
         dr_thread_free(ctx);
     } else {
-        mp_dispatch_run(ctx->dr->dispatch, dr_thread_free, ctx);
+        mp_dispatch_enqueue(dr->dispatch, dr_thread_free, ctx);
     }
 }
 
diff --git a/video/out/dr_helper.h b/video/out/dr_helper.h
index cf37c570e2..ff1d268426 100644
--- a/video/out/dr_helper.h
+++ b/video/out/dr_helper.h
@@ -3,18 +3,35 @@
 // This is a helper for implementing thread-safety for DR callbacks. These need
 // to allocate GPU buffers on the GPU thread (e.g. OpenGL with its forced TLS),
 // and the buffers also need to be freed on the GPU thread.
+// This is not a helpful "Dr.", rather it represents Satan in form of C code.
 struct dr_helper;
 
 struct mp_image;
 struct mp_dispatch_queue;
 
-// This MUST be called on the "target" thread (it will call pthread_self()).
 // dr_helper_get_image() calls will use the dispatch queue to run get_image on
-// the target thread too.
+// a target thread, which processes the dispatch queue.
+// Note: the dispatch queue must process outstanding async. work before the
+//       dr_helper instance can be destroyed.
 struct dr_helper *dr_helper_create(struct mp_dispatch_queue *dispatch,
             struct mp_image *(*get_image)(void *ctx, int imgfmt, int w, int h,
                                           int stride_align),
             void *get_image_ctx);
 
+// Make DR release calls (freeing images) reentrant if they are called on this
+// (pthread_self()) thread. That means any free call will directly release the
+// image as allocated with get_image().
+// Only 1 thread can use this at a time. Note that it would make no sense to
+// call this on more than 1 thread, as get_image is assumed not thread-safe.
+void dr_helper_acquire_thread(struct dr_helper *dr);
+
+// This _must_ be called on the same thread as dr_helper_acquire_thread() was
+// called. Every release call must be paired with an acquire call.
+void dr_helper_release_thread(struct dr_helper *dr);
+
+// Allocate an image by running the get_image callback on the target thread.
+// Always blocks on dispatch queue processing. This implies there is no way to
+// allocate a DR'ed image on the render thread (at least not in a way which
+// actually works if you want foreign threads to be able to free them).
 struct mp_image *dr_helper_get_image(struct dr_helper *dr, int imgfmt,
                                      int w, int h, int stride_align);
diff --git a/video/out/vo.c b/video/out/vo.c
index 7aafe28c64..e5b4198f76 100644
--- a/video/out/vo.c
+++ b/video/out/vo.c
@@ -1046,8 +1046,10 @@ static void *vo_thread(void *ptr)
 
     mpthread_set_name("vo");
 
-    if (vo->driver->get_image)
+    if (vo->driver->get_image) {
         in->dr_helper = dr_helper_create(in->dispatch, get_image_vo, vo);
+        dr_helper_acquire_thread(in->dr_helper);
+    }
 
     int r = vo->driver->preinit(vo) ? -1 : 0;
     mp_rendezvous(vo, r); // init barrier
diff --git a/video/out/vo_libmpv.c b/video/out/vo_libmpv.c
index b6c4d9fb06..fffb65b29a 100644
--- a/video/out/vo_libmpv.c
+++ b/video/out/vo_libmpv.c
@@ -46,21 +46,9 @@
  *  And: render thread > VO (wait for present)
  *       VO > render thread (wait for present done, via timeout)
  *
- *  Inherent locking bugs with advanced_control:
- *
- *      In theory, it can deadlock on ctx->lock. Consider if the VO thread calls
- *      forget_frames(), which it does under ctx->lock (e.g. VOCTRL_RESET).
- *      If a frame was a DR image, dr_helper.c will call mp_dispatch_run().
- *      This in turn will call the wakeup callback set with
- *      mpv_render_context_set_update_callback(). The API user will eventually
- *      call mpv_render_context_update(), which performs the dispatch queue work
- *      queued by dr_helper.c.
- *      And then the function tries to acquire ctx->lock. This can deadlock
- *      under specific circumstances. It will _not_ deadlock if the queued work
- *      made the caller release the lock. However, if the caller made queue
- *      some more work (like freeing a second frame), and will keep the lock
- *      until it gets a reply. Both threads will wait on each other, and no
- *      progress can be made.
+ *  Locking gets more complex with advanced_control enabled. Use
+ *  mpv_render_context.dispatch with care; synchronous calls can add lock
+ *  dependencies.
  */
 
 struct vo_priv {
@@ -306,6 +294,11 @@ void mpv_render_context_free(mpv_render_context *ctx)
     assert(!atomic_load(&ctx->in_use));
     assert(!ctx->vo);
 
+    // With the dispatch queue not being served anymore, allow frame free
+    // requests from this thread to be served directly.
+    if (ctx->dr)
+        dr_helper_acquire_thread(ctx->dr);
+
     // Possibly remaining outstanding work.
     mp_dispatch_queue_process(ctx->dispatch, 0);
author	wm4 <wm4@nowhere>	2019-09-26 14:07:33 +0200
committer	wm4 <wm4@nowhere>	2019-09-26 14:14:49 +0200
commit	4d43c79e4c1b76d70bf0e976e6c72945ce962140 (patch)
tree	052ebcba5f713ee10959a5fe51a700fa1c53dbf0
parent	3ee6d7db4e2e61465d18db05a8346086ffa0884b (diff)
download	mpv-4d43c79e4c1b76d70bf0e976e6c72945ce962140.tar.bz2 mpv-4d43c79e4c1b76d70bf0e976e6c72945ce962140.tar.xz