summaryrefslogtreecommitdiffstats
path: root/misc/charset_conv.c
Commit message (Collapse)AuthorAgeFilesLines
* sub: detect charset in demuxerwm42015-12-171-0/+7
| | | | | | | | | | | | Slightly simpler, and removes the need to pre-read all subtitle packets. This still does the subtitle charset conversion on the packet level (instead converting when parsing the file), so in theory this still could provide a way to change the charset at runtime. But maybe even this should be removed, as FFmpeg is somewhat likely to get its own charset detection and conversion mechanism in the future. (Would have to keep the subtitle file in memory to allow changing the charset on the fly, I guess.)
* demux_libass: remove this demuxerwm42015-11-111-18/+0
| | | | | | | | | This loaded external .ass files via libass. libavformat's .ass reader is now good enough, so use that instead. Apparently libavformat still doesn't support fonts embedded into text .ass files, but support for this has been accidentally broken in mpv for a while anyway. (And only 1 person complained.)
* sub: fix --sub-codepage UTF-8 with fallbackwm42015-09-011-0/+4
| | | | | | | | Fixes e.g --sub-codepage=utf8:gb18030 if the subtitle us UTF-8. This was broken in commit e5d31808. Also log the detected charset in verbose mode.
* charset_conv: use our own UTF-8 check with ENCA onlywm42015-08-041-6/+5
| | | | | | | | | | Some charsets can look like valid UTF-8, but aren't UTF-8. One example is ISO-2022-JP. While ENCA apparently likes to get misdetect real UTF-8, this is not the case with uchardet. uchardet can detect ISO-2022-JP correctly, but didn't even get to try, because our own UTF-8 check succeeded. So run the UTF-8 check when using ENCA only. Fixes #2195.
* charset_conv: "auto" encoding detection now uses uchardet.Jehan2015-08-041-1/+3
| | | | | If mpv is not built with uchardet, "enca" is still the fallback default encoding detection.
* charset_conv: fix switched parameterswm42015-08-021-1/+1
| | | | Fixes #2186.
* charset_conv: add uchardet supportwm42015-08-021-0/+39
| | | | | | | | | | | | | | For now, it needs to be explicitly selected. ENCA is still the default. This assumes uchardet returns iconv names. This doesn't seem to be always the case, and the result are lots of iconv errors. So explicitly check for this situation, and print a warning if it occurs. It's entirely possible that uchardet support is actually useless, because names are not necessarily iconv-compatible (but uchardet doesn't seem to document whether it attempts to return iconv-compatible names if possible). Fixes #908.
* charset_conv: make it possible to return an allocated string as guesswm42015-08-011-4/+8
| | | | | | | | | uchardet is written in C++, and thus doesn't appreciate the value of using static strings, and internally stores the guessed charset as allocated std::string. Add a minimal hack to deal with this. (I don't appreciate that the code is potentially harder to understand by returning either a static or allocated string, but I do appreciate for not having to litter the existing code with strdups.)
* sub: add detection via BOMwm42014-07-221-4/+30
| | | | | | | | | | | Useful for Windows stuff. Actually, ENCA support should catch this, but, well, whatever, everyone seems to hate ENCA. Detection with BOM is trivial, although it needs some hackery to integrate it with the existing autodetection support. For one, change the default value of --sub-codepage to make this easier. Probably fixes issue #937 (the second part).
* build: include <strings.h> for strcasecmp()wm42014-07-101-0/+1
| | | | | | | It happens to work without strings.h on glibc or with _GNU_SOURCE, but the POSIX standard requires including <strings.h>. Hopefully fixes OSX build.
* charset_conv: mp_msg conversionswm42013-12-211-20/+21
|
* Split mpvcore/ into common/, misc/, bstr/wm42013-12-171-0/+287