summaryrefslogtreecommitdiffstats
path: root/DOCS
diff options
context:
space:
mode:
authorwm4 <wm4@nowhere>2016-12-09 19:51:29 +0100
committerwm4 <wm4@nowhere>2016-12-09 19:51:29 +0100
commitc324bfab594cc9228f699d86c74e2b0da049bf58 (patch)
tree089de6b1c55cabbe780a0da2beaecec15ef38fdc /DOCS
parent0eb87e1baff89e7a979df23f31ca918f9d107e44 (diff)
downloadmpv-c324bfab594cc9228f699d86c74e2b0da049bf58.tar.bz2
mpv-c324bfab594cc9228f699d86c74e2b0da049bf58.tar.xz
charset_conv: simplify and change --sub-codepage option
As documented in interface-changes.rst. This makes it much easier to follow what the heck is going on. Whether this is adequate for real-world use is unknown.
Diffstat (limited to 'DOCS')
-rw-r--r--DOCS/interface-changes.rst6
-rw-r--r--DOCS/man/options.rst69
2 files changed, 23 insertions, 52 deletions
diff --git a/DOCS/interface-changes.rst b/DOCS/interface-changes.rst
index cf53a934a1..a33968307e 100644
--- a/DOCS/interface-changes.rst
+++ b/DOCS/interface-changes.rst
@@ -36,6 +36,12 @@ Interface changes
- remove --vo-defaults and --ao-defaults as well
- remove deprecated global sub-options (like -demuxer-rawaudio format=...),
use flat options (like --demuxer-rawaudio-format=...)
+ - the --sub-codepage option changes in incompatible ways:
+ - detector-selection and fallback syntax is deprecated
+ - enca/libguess are removed and deprecated (behaves as if they hadn't
+ been compiled-in)
+ - --sub-codepage=<codepage> does not force the codepage anymore
+ (this requires different and new syntax)
--- mpv 0.22.0 ---
- the "audio-device-list" property now sets empty device description to the
device name as a fallback
diff --git a/DOCS/man/options.rst b/DOCS/man/options.rst
index 3f7eb35b15..e5703cfca2 100644
--- a/DOCS/man/options.rst
+++ b/DOCS/man/options.rst
@@ -1754,66 +1754,31 @@ Subtitles
:all: Load all subs in the current and ``--sub-paths`` directories.
``--sub-codepage=<codepage>``
- If your system supports ``iconv(3)``, you can use this option to specify
- the subtitle codepage. By default, uchardet will be used to guess the
- charset. If mpv is not compiled with uchardet, enca will be used.
- If mpv is compiled with neither uchardet nor enca, ``UTF-8:UTF-8-BROKEN``
- is the default, which means it will try to use UTF-8, otherwise the
- ``UTF-8-BROKEN`` pseudo codepage (see below).
+ You can use this option to specify the subtitle codepage. uchardet will be
+ used to guess the charset. (If mpv was not compiled with uchardet, then
+ ``utf-8`` is the effective default.)
- The default value for this option is ``auto``, whose actual effect depends
- on whether ENCA is compiled.
+ The default value for this option is ``auto``, which enables autodetection.
- .. admonition:: Warning
-
- If you force the charset, even subtitles that are known to be
- UTF-8 will be recoded, which is perhaps not what you expect. Prefix
- codepages with ``utf8:`` if you want the codepage to be used only if the
- input is not valid UTF-8.
-
- .. admonition:: Examples
-
- - ``--sub-codepage=utf8:latin2`` Use Latin 2 if input is not UTF-8.
- - ``--sub-codepage=cp1250`` Always force recoding to cp1250.
-
- The pseudo codepage ``UTF-8-BROKEN`` is used internally. When it
- is the codepage, subtitles are interpreted as UTF-8 with "Latin 1" as
- fallback for bytes which are not valid UTF-8 sequences. iconv is
- never involved in this mode.
+ The following steps are taken to determine the final codepage, in order:
- If the player was compiled with ENCA support, you can control it with the
- following syntax:
-
- ``--sub-codepage=enca:<language>:<fallback codepage>``
-
- Language is specified using a two letter code to help ENCA detect
- the codepage automatically. If an invalid language code is
- entered, mpv will complain and list valid languages. (Note
- however that this list will only be printed when the conversion code is actually
- called, for example when loading an external subtitle). The
- fallback codepage is used if autodetection fails. If no fallback
- is specified, ``UTF-8-BROKEN`` is used.
+ - if the specific codepage has a ``+``, use that codepage
+ - if the data looks like UTF-8, assume it is UTF-8
+ - if ``--sub-codepage`` is set to a specific codepage, use that
+ - run uchardet, and if successful, use that
+ - otherwise, use ``UTF-8-BROKEN``
.. admonition:: Examples
- - ``--sub-codepage=enca:pl:cp1250`` guess the encoding, assuming the subtitles
- are Polish, fall back on cp1250
- - ``--sub-codepage=enca:pl`` guess the encoding for Polish, fall back on UTF-8.
- - ``--sub-codepage=enca`` try universal detection, fall back on UTF-8.
-
- If the player was compiled with libguess support, you can use it with:
-
- ``--sub-codepage=guess:<language>:<fallback codepage>``
-
- libguess always needs a language. There is no universal detection
- mode. Use ``--sub-codepage=guess:help`` to get a list of
- languages subject to the same caveat as with ENCA above.
-
- If the player was compiled with uchardet support you can use it with:
+ - ``--sub-codepage=latin2`` Use Latin 2 if input is not UTF-8.
+ - ``--sub-codepage=+cp1250`` Always force recoding to cp1250.
- ``--sub-codepage=uchardet``
+ The pseudo codepage ``UTF-8-BROKEN`` is used internally. If it's set,
+ subtitles are interpreted as UTF-8 with "Latin 1" as fallback for bytes
+ which are not valid UTF-8 sequences. iconv is never involved in this mode.
- This mode doesn't take language or fallback codepage.
+ This option changed in mpv 0.23.0. The old syntax is still emulated to some
+ degree.
``--sub-fix-timing``, ``--no-sub-fix-timing``
By default, subtitle timing is adjusted to remove minor gaps or overlaps