summaryrefslogtreecommitdiffstats
path: root/sub/filter_sdh.c
Commit message (Collapse)AuthorAgeFilesLines
* filter_sdh: optimize get_char_bytesDudemanguy2024-01-151-5/+5
| | | | | | | strlen is only relevant if the length is less than [1, 4], so this can be replaced with strnlen instead which will only traverse characters upto the maxlen insted of the entire string length. It also makes MPMIN unneeded. Also fix a comment.
* filter_sdh: fix incorrect placement of null terminatorDudemanguy2024-01-121-1/+1
| | | | | | | The +1 here is not correct. For a 4-byte unicode character, this would throw a runtime error because the +1 would try to assign the null terminator past the actual bound of our array. Just remove it since it should be exactly equal to whatever we have for bytes.
* filter_sdh: sanitize get_char_bytes heuristic to avoid overflowDudemanguy2024-01-121-4/+4
| | | | | | | | | | | There's a simple check in filter_sdh that gets the bytes of the first character in a string in order to do pointer arthimetic to filter the string. The problem is that it is possible for the amount of bytes to be greater than the actual length of the string for certain unicode characters. This can't be worked with so enforce the strlen as the absolute minimum here to avoid overflow situations. Fixes #13237.
* filter_sdh: add full width parentheses to the enclosures stringDudemanguy2023-12-081-5/+7
| | | | | Since these are technically parentheses, we'll treat them the same way as normal parenthesis. Fixes #11155.
* filter_sdh: add --sub-filter-sdh-enclosures optionDudemanguy2023-12-081-10/+62
| | | | | | | | | | | This filter is a bit complicated, but one of the essential parts of it is removing text enclosed by particular set of characters (e.g. text inbetween []). This was previously hardcoded to only take into account parenthesis and brackets, but people may want to filter more things so make this customizable. The option only takes "left hand characters" so the right pair is mapped internally if applicable. If not, then we just use the same character. Fixes #8268 since the unicode character in question can just be passed to this option.
* filter_sdh: combine skip_bracketed and skip_parenthesizedDudemanguy2023-12-081-67/+28
| | | | | | | | | | These two functions are almost exactly the same. The parenthesis variant is essentially just a special case with more conditions to not remove text. These can easily be combined together into one generic skip_enclosed function to handle both cases. We also use char * instead of char for the character comparison here since not everything is neccesarily 1 byte and can fit into a char. This will be useful for the following commits where we extend this logic further.
* various: fix various typos in the code baseAlexander Seiler2023-03-281-2/+2
| | | | Signed-off-by: Alexander Seiler <seileralex@gmail.com>
* various: remove trailing whitespaceGuido Cella2022-05-141-1/+1
|
* various: fix typosCœur2022-04-251-4/+4
|
* sub: SDH filter: use unified text-extraction utilsAvi Halachmi (:avih)2021-08-051-46/+17
|
* sub: SDH filter: small refinementsAvi Halachmi (:avih)2021-08-051-6/+10
| | | | | | | | | | | | | | | 1. On a pathological case where event_format is NULL, previously the filter was trying to use it with each new sub - and re-failed. Now the filter gets disabled on init (event_format doesn't change). 2. Previously, if the filter didn't modify the text or if the text could not be extracted - it still allocated a new packet with same content. Now it returns the original, saving a whole lot of no-ops (there are still few allocations in this case though). 1 above is preparation for the next commit, and 2 was trivial, but there's more to do if anyone cares (NIH string functions instead of bstr, unused arguments, messages could be improved, and more).
* sub: make filter_sdh a "proper" filter, allow runtime changeswm42020-02-161-9/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, filter_sdh was simply a function that was called by sd_ass directly (if enabled). I want to add another filter, so it's time to turn this into a somewhat more general subtitle filtering infrastructure. I pondered whether to reuse the audio/video filtering stuff - but better not. Also, since subtitles are horrible and tend to refuse proper abstraction, it's still messed into sd_ass, instead of working on the dec_sub.c level. Actually mpv used to have subtitle "filters" and even made subtitle converters part of it, but it was fairly horrible, so don't do that again. In addition, make runtime changes possible. Since this was supposed to be a quick hack, I just decided to put all subtitle filter options into a separate option group (=> simpler change notification), to manually push the change through the playloop (like it was sort of before for OSD options), and to recreate the sub filter chain completely in every change. Should be good enough. One strangeness is that due to prefetching and such, most subtitle packets (or those some time ahead) are actually done filtering when we change, so the user still needs to manually seek to actually refresh everything. And since subtitle data is usually cached in ASS_Track (for other terrible but user-friendly reasons), we also must clear the subtitle data, but of course only on seek, since otherwise all subtitles would just disappear. What a fucking mess, but such is life. We could trigger a "refresh seek" to make this more automatic, but I don't feel like it currently. This is slightly inefficient (lots of allocations and copying), but I decided that it doesn't matter. Could matter slightly for crazy ASS subtitles that render with thousands of events. Not very well tested. Still seems to work, but I didn't have many test cases.
* sub: enhance SDH filteringDan Oscarsson2020-02-091-0/+8
| | | | | It is not uncommon with a speaker label with [xxxx] inside. They should also be filtered out.
* sub: recognize UTF-8 characters in SDH subtitle filterzc622019-03-021-1/+4
| | | | | | | | | | | Only printable ASCII characters were considered to be valid texts. Make it possible that UTF-8 contents are also considered valid. This does not make the SDH subtitle filter support non-English languages. This just prevents the filter from blindly marking lines that have only UTF-8 characters as empty. Fixes #6502
* filter_sdh: remove pointless set_pos functionwm42017-04-201-17/+10
| | | | | This change was requested during patch review, but apparently it was overlooked on merge.
* filter_sdh: change license to LGPLwm42017-04-201-7/+7
|
* sub: minor sdh filter fixesDan Oscarsson2017-04-151-1/+2
| | | | | | | | | | When doing harder filtering not require a space after : results in lines with a clock (like 10:05) to be taken as a speaker label. So require a space after : even when doing harder filtering as missing space is very uncommon. Some like to add text in parentheses in the speaker label, like XXX (loud): or just (loud): allow parentheses when doing harder filtering
* sub: add SDH subtitle filterDan Oscarsson2017-03-251-0/+459
Add subtitle filter to remove additions for deaf or hard-of-hearing (SDH). This is for English, but may in part work for others too. This is an ASS filter and the intention is that it can always be enabled as it by default do not remove parts that may be normal text. Harder filtering can be enabled with an additional option. Signed-off-by: wm4 <wm4@nowhere>