From 2a7e17fe4e2875d7a85262c2efe16bbdd797a890 Mon Sep 17 00:00:00 2001 From: wm4 Date: Fri, 27 Dec 2019 17:09:47 +0100 Subject: DOCS/tech-overview.txt: add lots of irrelevant blabla Thought it might be useful to document some of these things, instead of explaining them over and over again. But I can guarantee that nobody will ever read all this. (Independent of its quality and completeness.) --- DOCS/tech-overview.txt | 373 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 373 insertions(+) diff --git a/DOCS/tech-overview.txt b/DOCS/tech-overview.txt index 8d75adfff3..fcd32b9fff 100644 --- a/DOCS/tech-overview.txt +++ b/DOCS/tech-overview.txt @@ -242,3 +242,376 @@ sub/: etc/: The file input.conf is actually integrated into the mpv binary by the build system. It contains the default keybindings. + +Best practices and Concepts within mpv +====================================== + +General contribution etc. +------------------------- + +See DOCS/contribute.md. + +Error checking +-------------- + +If an error is relevant, it should be handled. If it's interesting, log the +error. However, mpv often keeps errors silent and reports failures somewhat +coarsely by propagating them upwards the caller chain. This is OK, as long as +the errors are not very interesting, or would require a developer to debug it +anyway (in which case using a debugger would be more convenient, and the +developer would need to add temporary debug printfs to get extremely detailed +information which would not be appropriate during normal operation). + +Basically, keep a balance on error reporting. But always check them, unless you +have a good argument not to. + +Memory allocation errors (OOM) are a special class of errors. Normally such +allocation failures are not handled "properly". Instead, abort() is called. +(New code should use MP_HANDLE_OOM() for this.) This is done out of laziness and +for convenience, and due to the fact that MPlayer/mplayer2 never handled it +correctly. (MPlayer varied between handling it correctly, trying to do so but +failing, and just not caring, while mplayer2 started using abort() for it.) + +This is justifiable in a number of ways. Error handling paths are notoriously +untested and buggy, so merely having them won't make your program more reliable. +Having these error handling paths also complicates non-error code, due to the +need to roll back state at any point after a memory allocation. + +Take any larger body of code, that is supposed to handle OOM, and test whether +the error paths actually work, for example by overriding malloc with a version +that randomly fails. You will find bugs quickly, and often they will be very +annoying to fix (if you can even reproduce them). + +In addition, a clear indication that something went wrong may be missing. On +error your program may exhibit "degraded" behavior by design. Consider a video +encoder dropping frames somewhere in the middle of a video due to temporary +allocation failures, instead of just exiting with an errors. In other cases, it +may open conceptual security holes. Failing fast may be better. + +mpv uses GPU APIs, which may be break on allocation errors (because driver +authors will have the same issues as described here), or don't even have a real +concept for dealing with OOM (OpenGL). + +libmpv is often used by GUIs, which I predict always break if OOM happens. + +Last but not least, OSes like Linux use "overcommit", which basically means that +your program may crash any time OOM happens, even if it doesn't use malloc() at +all! + +But still, don't just assume malloc() always succeeds. Use MP_HANDLE_OOM(). The +ta* APIs do this for you. The reason for this is that dereferencing a NULL +pointer can have security relevant consequences if large offsets are involved. +Also, a clear error message is better than a random segfault. + +Some big memory allocations are checked anyway. For example, all code must +assume that allocating video frames or packets can fail. (The above example +of dropping video frames during encoding is entirely possible in mpv.) + +Undefined behavior +------------------ + +Undefined behavior (UB) is a concept in the C language. C is famous for being a +language that makes it almost impossible to write working code, because +undefined behavior is so easily triggered, compilers will happily abuse it to +generate "faster" code, debugging tools will shout at you, and sometimes it +even means your code doesn't work. + +There is a lot of literature on this topic. Read it. + +(In C's defense, UB exists in other languages too, but since they're not used +for low level infrastructure, and/or these languages are at times not rigorously +defined, simply nobody cares. However, the C standard committee is still guilty +for not addressing this. I'll admit that I can't even tell from the standard's +gibberish whether some specific behavior is UB or not. It's written like tax +law.) + +In mpv, we generally try to avoid undefined behavior. For one, we want portable +and reliable operation. But more importantly, we want clean output from +debugging tools, in order to find real bugs more quickly and effectively. + +Avoid the "works in practice" argument. Once debugging tools come into play, or +simply when "in practice" stops being true, this will all get back to you in a +bad way. + +Global state, library safety +---------------------------- + +Mutable global state is when code uses global variables that are not read-only. +This must be avoided in mpv. Always use context structs that the caller of +your code needs to allocate, and whose pointers are passed to your functions. + +Library safety means that your code (or library) can be used by a library +without causing conflicts with other library users in the same process. To any +piece of code, a "safe" library's API can simply be used, without having to +worry about other API users that may be around somewhere. + +Libraries are often not library safe, because they they use global mutable state +or other "global" resources. Typical examples include use of signals, simple +global variables (like hsearch() in libc), or internal caches not protected by +locks. + +A surprisingly high number of libraries are not library safe because they need +global initialization. Typically they provide an API function, which +"initializes" the library, and which must be called before calling any other +API functions. Often, you are to provide global configuration parameters, which +can change the behavior of the library. If two libraries A and B use library C, +but A and B initialize C with different parameters, something "bad" may happen. +In addition, these global initialization functions are often not thread-safe. So +if A and B try to initialize C at the same time (from different threads and +without knowing about each other), it may cause undefined behavior. (libcurl is +a good example of both of these issues. FFmpeg and some TLS libraries used to be +affected, but improved.) + +This is so bad because library A and B from the previous example most likely +have no way to cooperate, because they're from different authors and have no +business knowing each others. They'd need a library D, which wraps library C +in a safe way. Unfortunately, typically something worse happens: libraries get +"infected" by the unsafeness of its sub-libraries, and export a global init API +just to initialize the sub-libraries. In the previous example, libraries A and B +would export global init APIs just to init library C, even though the rest of +A/B are clean and library safe. (Again, libcurl is an example of this, if you +subtract other historic anti-features.) + +The main problem with library safety is that its lack propagates to all +libraries using the library. + +We require libmpv to be library safe. This is not really possible, because some +libraries are not library safe (FFmpeg, Xlib, partially ALSA). However, for +ideological reasons, there is no global init API, and best effort is made to try +to avoid problems. + +libmpv has some features that are not library safe, but which are disabled by +default (such as terminal usage aka stdout, or JSON IPC blocking SIGPIPE for +internal convenience). + +A notable, very disgustingly library unsafe behavior of libmpv is calling +abort() on some memory allocation failure. See error checking section. + +Logging +------- + +All logging and terminal output in mpv goes through the functions and macros +provided in common/msg.h. This is in part for library safety, and in part to +make sure users can silence all output, or to redirect the output elsewhere, +like a log file or the internal console.lua script. + +Locking +------- + +See generally available literature. In mpv, we use pthread for this. + +Always keep locking clean. Don't skip locking just because it will work "in +practice". (See undefined behavior section.) If your use case is simple, you may +use C11 atomics( osdep/atomic.h for partial C99 support), but most likely you +will only hurt yourself and others. + +Always make clear which fields in a struct are protected by which lock. If a +field is immutable, or simply not thread-safe (e.g. state for a single worker +thread), document it as well. + +Internal mpv APIs are assumed to be not thread-safe by default. If they have +special guarantees (such as being usable by more than one thread at a time), +these should be explicitly documented. + +All internal mpv APIs must be free of global state. Even if a component is not +thread-safe, multiple threads can use _different_ instances of it without any +locking. + +On a side note, recursive locks may seem convenient at first, but introduces +additional problems with condition variables and locking hierarchies. They +should be avoided. + +Locking hierarchy +----------------- + +A simple way to avoid deadlocks with classic locking is to define a locking +hierarchy or lock order. If all threads acquire locks in the same order, no +deadlocks will happen. + +For example, a "leaf" lock is a lock that is below all other locks in the +hierarchy. You can acquire it any time, as long as you don't acquire other +locks while holding it. + +Unfortunately, C has no way to declare or check the lock order, so you should at +least document it. + +In addition, try to avoid exposing locks to the outside. Making the declaration +of a lock private to a specific .c file (and _not_ exporting accessors or +lock/unlock that manipulate the lock) is a good idea. Your component's API may +acquire internal locks, but should release them when returning. Keeping the +entire locking in a single file makes it easy to check it. + +Avoiding callback hell +---------------------- + +mpv code is separated in components, like the "frontend" (i.e. MPContext mpctx), +VOs, AOs, demuxers, and more. The frontend usually calls "down" the usage +hierarchy: mpctx almost on top, then things like vo/ao, and utility code on the +very bottom. + +"Callback hell" is when when components call both up and down the hierarchy, +which for example leads to accidentally recursion, reentrancy problems, or +locking nightmares. This is avoided by (mostly) calling only down the hierarchy. +Basically the call graph forms a DAG. The other direction is handled by event +queues, wakeup callbacks, and similar mechanisms. + +Typically, a component provides an API, and does not know anything about its +user. The API user (component higher in the hierarchy) polls the state of the +lower component when needed. + +This also enforces some level of modularization, and with some luck the locking +hierarchy. (Basically, locks of lower components automatically become leaf +locks.) Another positive effect is simpler memory management. + +(Also see e.g.: http://250bpm.com/blog:24) + +Wakeup callbacks +---------------- + +This is a common concept in mpv. Even the public API uses it. It's used when an +API has internal threads (or otherwise triggers asynchronous events), but the +component call hierarchy needs to be kept. The wakeup callback is the only +exception to the call hierarchy, and always calls up. + +For example, vo spawns a thread that the API user. The mpv frontend is oblivious +to this. vo simply provides a thread-safe API. vo needs to notify the API user +of new events. But the vo event producer is on the vo thread - it can't simply +invoke a callback back into the API user, because then the API user has to deal +with locking, despite not using threads. In addition, this will probably cause +problems like mentioned in the "callback hell" section, especially lock order +issues. + +The solution is the wakeup callback. It merely unblocks the API user from +waiting, and the API user then uses the normal vo API to examine whether or +which state changed. As a concept, it documents what a wakeup callback is +allowed to do and what not, to avoid the aforementioned problems. + +Generally, you are not allowed to call any API from the wakeup callback. You +just do whatever is needed to unblock your thread. For example, if it's waiting +on a mutex/condition variable, acquire the mutex, set a change flag, signal +the condition variable, unlock, return. (This mutex must not be held when +calling the API. It must be a leaf lock.) + +Restricting the wakeup callback like this sidesteps any reentrancy issues and +other complexities. The API implementation can simply hold internal (and +non-recursive) locks while invoking the wakeup callback. + +The API user still needs to deal with locking (probably), but there's only the +need to implement a single "receiver", that can handle the entire API of the +used component. (Or multiple APIs - MPContext for example has only 1 wakeup +callback that handles all AOs, VOs, input, demuxers, and more. It simple re-runs +the playloop.) + +You could get something more advanced by turning this into a message queue. The +API would append a message to the queue, and the API user can read it. But then +you still need a way to "wakeup" the API user (unless you force the API user +to block on your API, which will make things inconvenient for the API user). You +also need to worry about what happens if the message queue overruns (you either +lose messages or have unbounded memory usage). In the mpv public API, the +distinction between message queue and wakeup callback is sort of blurry, because +it does provide a message queue, but an additional wakeup callback, so API +users are not required to call mpv_wait_event() with a high timeout. + +mpv itself prefers using wakeup callbacks over a generic event queue, because +most times an event queue is not needed (or complicates things), and it is +better to do it manually. + +(You could still abstract the API user side of wakeup callback handling, and +avoid reimplementing it all the time. Although mp_dispatch_queue already +provides mechanisms for this.) + +Condition variables +------------------- + +They're used whenever a thread needs to wait for something, without nonsense +like sleep calls or busy waiting. mpv uses the standard pthread API for this. +There's a lot of literature on it. Read it. + +For initial understanding, it may be helpful to know that condition variables +are not variables that signal a condition. pthread_cond_t does not have any +state per-se. Maybe pthread_cond_t would better be named pthread_interrupt_t, +because its sole purpose is to interrupt a thread waiting via pthread_cond_wait() +(or similar). The "something" in "waiting for something" can be called +predicate (to avoid confusing it with "condition"). Consult literature for the +proper terms. + +The very short version is: + + // --- Shared declarations + + pthread_mutex_t lock; + pthread_cond_t cond_var; + struct something state_var; // protected by lock, changes signaled by cond_var + + // --- Waiter thread + + pthread_mutex_lock(&lock); + + // Wait for a change in state_var. We want to wait until predicate_fulfilled() + // returns true. + // Must be a loop for 2 reasons: + // 1. cond_var may be associated with other conditions too + // 2. pthread_cond_wait() can have sporadic wakeups + while (!predicate_fulfilled(&state_var)) { + // This unlocks, waits for cond_var to be signaled, and then locks again. + // The _whole_ point of cond_var is that unlocking and waiting for the + // signal happens atomically. + pthread_cond_wait(&cond_var, &lock); + } + + // Here you may react to the state change. The state cannot change + // asynchronously as long as you still hold the lock (and didn't release + // and reacquire it). + // ... + + pthread_mutex_unlock(&lock); + + // --- Signaler thread + + pthread_mutex_lock(&lock); + + // Something changed. Update the shared variable with the new state. + update_state(&state_var); + + // Notify that something changed. This will wake up the waiter thread if + // it's blocked in pthread_cond_wait(). If not, nothing happens. + pthread_cond_broadcast(&cond_var); + + // Fun fact: good implementations wake up the waiter only when the lock is + // released, to reduce kernel scheduling overhead. + pthread_mutex_unlock(&lock); + + +Some basic rules: + 1. Always access your state under proper locking + 2. Always check your predicate before every call to pthread_cond_wait() + (And don't call pthread_cond_wait() if the predicate is fulfilled.) + 3. Always call pthread_cond_wait() in a loop + (And only if your predicate failed without releasing the lock..) + 4. Always call pthread_cond_broadcast()/_signal() inside of its associated + lock + +mpv sometimes violates rule 3, and leaves "retrying" (i.e. looping) to the +caller. + +Common pitfalls: + - Thinking that pthread_cond_t is some kind of semaphore, or holds any + application state or the user predicate (it _only_ wakes up threads + that are at the same time blocking on pthread_cond_wait() and friends, + nothing else) + - Changing the predicate, but not updating all pthread_cond_broadcast()/ + _signal() calls correctly + - Forgetting that pthread_cond_wait() unlocks the lock (other threads can + and must acquire the lock) + - Holding multiple nested locks while trying to wait (=> deadlock, violates + the lock order anyway) + - Waiting for a predicate correctly, but unlocking/relocking before acting + on it (unlocking allows arbitrary state changes) + - Confusing which lock/condition var. is used to manage a bit of state + +Generally available literature probably has better examples and explanations. + +Using condition variables the proper way is generally preferred over using more +messy variants of them. (Just saying because on win32, "Event" exists, and it's +inferior to condition variables. Try to avoid the win32 primitives, even if +you're dealing with Windows-only code.) -- cgit v1.2.3