|author||wm4 <wm4@nowhere>||2014-10-23 11:13:59 +0200|
|committer||wm4 <wm4@nowhere>||2014-10-23 11:13:59 +0200|
manpage: ipc: comment about invalid UTF-8
Some rationale for the documented/suggested behavior: It's not really clear what to do with invalid UTF-8, since JSON simply can't transport this information. Maybe you could transfer such strings as byte arrays, but that would be very verbose and inconvenient, and would pose the problem that it's hard to distinguish between strings encoded in this way and actual arrays. There are many other ways how this could be handled. For example, you could replace invalid sequences with '?'. Or you could do it like Python, and use certain reserved unicode codepoints to "tunnel" through invalid bytes. Which of these works really depends on the application. And since this can be done entirely on the byte level (invalid UTF-8 sequences can appear only in strings in our case), it's best to leave this to the receiver.
1 files changed, 12 insertions, 0 deletions
diff --git a/DOCS/man/ipc.rst b/DOCS/man/ipc.rst
index 4666e7367e..91ed7c6d69 100644
@@ -155,3 +155,15 @@ extra commands can also be used as part of the protocol:
Undo one ``suspend`` call. ``suspend`` increments an internal counter, and
``resume`` decrements it. When 0 is reached, the player is actually resumed.
+Normally, all strings are in UTF-8. Sometimes it can happen that strings are
+in some broken encoding (often happens with file tags and such, and filenames
+on many Unixes are not required to be in UTF-8 either). This means that mpv
+sometimes sends invalid JSON. If that is a problem for the client application's
+parser, it should filter the raw data for invalid UTF-8 sequences and perform
+the desired replacement, before feeding the data to its JSON parser.
+mpv will not attempt to construct invalid UTF-8 with broken escape sequences.