diff options
author | wm4 <wm4@nowhere> | 2020-02-03 19:13:44 +0100 |
---|---|---|
committer | wm4 <wm4@nowhere> | 2020-02-03 19:13:44 +0100 |
commit | cbee577d0a787b7c8e329ef6d4fc8e37c05e9786 (patch) | |
tree | 7d079fd4fdf4fed27dca32d292ffd98aeaa18f6a /osdep | |
parent | 13624b5c7a54c743215cfb050519ff3907418cc6 (diff) | |
download | mpv-cbee577d0a787b7c8e329ef6d4fc8e37c05e9786.tar.bz2 mpv-cbee577d0a787b7c8e329ef6d4fc8e37c05e9786.tar.xz |
cue: tolerate NBSP as whitespace
Apparently such .cue files exist. They fail both probing and parsing. To
make it worse, the sample at hand was encoded as Latin1.
One part of this is replacing bstr_lstrip() with a version that supports
NBSP. One could argue that bstr_lstrip() should always do this, but I
don't want to overdo it. There are many more unicode abomination which
it could be said it's supposed to handle, so it will stay ASCII instead
of going down this rabbit hole. I'm just assuming this cue sheet was
generated by some stupid software that inexplicably liked NBSPs (which
is how we justify a one-off fix). The new lstrip_whitespace() doesn't
look particularly efficient, but it doesn't have to be.
The second part is dealing with the fact that the charset is not
necessarily UTF-8. We don't want to do conversion before probing thinks
it knows it's a cue sheet (would probably make it more fragile all
around), so just make it work with Latin1 by assuming invalid code
points are Latin1. This fallback is part of why lstrip_whitespace() is
sort of roundabout.
(You could still rewrite it as much more efficient state machine,
instead of using a slow and validating UTF-8 parser that is called per
codepoint. Starting to overthink this.)
Multimedia is terrible. Legacy charsets are terrible. Everything is
terrible.
Fixes: #7429
Diffstat (limited to 'osdep')
0 files changed, 0 insertions, 0 deletions