osx: consistent normalisation when searching for external files

several unicode characters can be encoded in two different ways, either in a precomposed (NFC) or decomposed (NFD) representation. everywhere besides on macOS, specifically HFS+, precomposed strings are being used. furthermore on macOS we can get either precomposed or decomposed strings, for example when not HFS+ formatted volumes are used. that can be the case for network mounted devices (SMB, NFS) or optical/removable devices (UDF). this can lead to an inequality of actual equal strings, which can happen when comparing strings from different sources, like the command line or filesystem. this makes it mainly a problem on macOS systems. one case that can potential break is the sub-auto option. to prevent that we convert the search string as well as the string we search in to the same normalised representation, specifically we use the decomposed form which is used anywhere else. this could potentially be a problem on other platforms too, though the potential of occurring is very minor. for those platforms we don't convert anything and just fallback to the input. Fixes #4016
author: Akemi <der.richter@gmx.de> 2017-01-24 19:48:02 +0100
committer: Akemi <der.richter@gmx.de> 2017-02-02 16:21:04 +0100
commit: 8bbdecea8358e1a00630f095d9859bef9bc1535d (patch)
tree: 3ce2173a90b5401602508877016264e96c9a90ef /misc
parent: ce23dfa2fab75756bb051df112cf3ab9ad61e802 (diff)
download: mpv-8bbdecea8358e1a00630f095d9859bef9bc1535d.tar.bz2
mpv-8bbdecea8358e1a00630f095d9859bef9bc1535d.tar.xz
2 files changed, 6 insertions, 1 deletions
diff --git a/misc/charset_conv.c b/misc/charset_conv.c
index 7dc1a87783..51e55c6338 100644
--- a/misc/charset_conv.c
+++ b/misc/charset_conv.c
@@ -234,5 +234,9 @@ bstr mp_iconv_to_utf8(struct mp_log *log, bstr buf, const char *cp, int flags)
 #endif
 
 failure:
-    return bstr_sanitize_utf8_latin1(NULL, buf);
+    if (flags & MP_NO_LATIN1_FALLBACK) {
+        return buf;
+    } else {
+        return bstr_sanitize_utf8_latin1(NULL, buf);
+    }
 }
diff --git a/misc/charset_conv.h b/misc/charset_conv.h
index 9be7a50961..ccaa17e3c9 100644
--- a/misc/charset_conv.h
+++ b/misc/charset_conv.h
@@ -10,6 +10,7 @@ enum {
     MP_ICONV_VERBOSE = 1,       // print errors instead of failing silently
     MP_ICONV_ALLOW_CUTOFF = 2,  // allow partial input data
     MP_STRICT_UTF8 = 4,         // don't fall back to UTF-8-BROKEN when guessing
+    MP_NO_LATIN1_FALLBACK = 8,  // fall back to input buffer instead of latin1
 };
 
 bool mp_charset_is_utf8(const char *user_cp);
author	Akemi <der.richter@gmx.de>	2017-01-24 19:48:02 +0100
committer	Akemi <der.richter@gmx.de>	2017-02-02 16:21:04 +0100
commit	8bbdecea8358e1a00630f095d9859bef9bc1535d (patch)
tree	3ce2173a90b5401602508877016264e96c9a90ef /misc
parent	ce23dfa2fab75756bb051df112cf3ab9ad61e802 (diff)
download	mpv-8bbdecea8358e1a00630f095d9859bef9bc1535d.tar.bz2 mpv-8bbdecea8358e1a00630f095d9859bef9bc1535d.tar.xz