| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of having to rely on the protocol matching, make a function that
creates a stream from a stream_info_t directly. Instead of going through
a weird indirection with STREAM_CTRL, add a direct argument for non-text
arguments to the open callback. Instead of creating a weird dummy
mpv_global, just pass an existing one from all callers. (The latter one
is just an artifact from the past, where mpv_global wasn't available
everywhere.)
Actually I just wanted a function that creates a stream without any of
that bullshit. This goal was slightly missed, since you still need this
heavy "constructor" just to setup a shitty struct with some shitty
callbacks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The previous commit broke audio playback (maybe this is what 4. was
about?). But it wasn't the fault of the commit; it just exposed
pre-existing issues.
If the packet queue search can't get all packets, it checked
queue->is_bof to see whether there could be previous packets. But
obviously, is_bof can be set, even if the search start packet wasn't the
first one.
This was especially observable with audio, because audio by default
needs preroll packets, and plays artifacts if they're missing.
Fix by using the BOF playback condition for this purpose too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In theory, backward demuxing does way too much work by doing a full
cache seek every time you need to step back through a packet queue. In
theory, it would be exceedingly more efficient to just iterate backwards
through the queue, but this is not possible because I'm too stingy to
add 8 bytes per packet for backlinks. (In theory, you could probably
come up with some sort of deque, that'd allow efficient iteration into
any direction, but other requirements make this tricky, and I'm
currently too dumb/lazy to do this. For example, the queue can grow to
millions of elements, all while packet pointers need to stay valid.)
Another possibility is to "locally" seek the queue. This still has less
overhead than a full seek.
Also, it just so happens that, as a side effect, this avoids performing
range merging, which commit f4b0e7e942 "broke". That wasn't a bug at
all, but since range joining is relatively slow, avoiding it is good.
This is really just a coincidental side effect, I'm not even quite sure
why it happens this way.
There are 4 ugly things about this change:
1. To get a keyframe "before" a certain one, we recompute the target
PTS, and then subtract 0.001 as arbitrary number to "fudge" it. This
isn't the first place where this is done, and hey, it wasn't my damn
idea that MPlayer should use floats for timestamps. (At first, it even
used 32 bit timestamps.)
2. This is the first time reader_head is reset to an earlier position
outside of the seek code. This might cause conceptual problems since
this code is now "duplicated".
3. In theory, find_backward_restart_pos() needs to be run again after
the backstep. Normally, the seek code calls it explicitly. We could call
it right in the new code, but then the damn function would be recursive.
We could shuffle the code around to make it a loop, but even then
there'd be an offchance into running into an unexpected corner case (aka
subtle bug), where it would loop forever. To avoid refactoring the code
and having to think too hard about it, make it deferred - add some new
state and the check_backward_seek() function for this. Great, even more
subtle mutable state for this backwards shit.
4. I forgot this one, but I can assure you, it's bad.
Without doubt someone will have to clean up this slightly in the future
(or rip it out), but it won't be me.
|
|
|
|
|
|
|
|
|
|
|
| |
This code tries to determine the "current" position, which is used as
base for the seek target when it needs to seek back more (the point is
to prevent seeking back too far). But compute_keyframe_times() almost
computes the same thing, so use that. Unfortunately needs a forward
declaration.
("Almost", because it differs in some details that should not really
matter.)
|
|
|
|
|
|
|
|
| |
demux_packet.next should not be used outside of demux.c, and in this
case it's a packet that was just passed to demux.c from the outside.
demux_packet.stream is already set by the demuxer, and this is assured
by the add_packet_locked() caller.
|
|
|
|
|
|
| |
The new location makes equally much sense (or more, since it's close to
its per-stream companion function), and we don't need a forward
declaration.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't care much about this case, because backward playback can fail
terribly without a good way to detect it, so this was fine.
However, this froze in certain situations. Reading from a subtitle file
for which backward demuxing failed could make it get stuck in
demux_read_packet_async() in unthreaded mode. (That we don't support
backwards subtitle decoding anyway doesn't matter for this.)
So aggressively disable backward demuxing to prevent worse in these
situations. The behavior will still be awful, because the frontend is
still in backwards playback mode, but at least it won't freeze.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Somewhat similar to the old --cache-file, except for the demuxer cache.
Instead of keeping packet data in memory, it's written to disk and read
back when needed.
The idea is to reduce main memory usage, while allowing fast seeking in
large cached network streams (especially live streams). Keeping the
packet metadata on disk would be rather hard (would use mmap or so, or
rewrite the entire demux.c packet queue handling), and since it's
relatively small, just keep it in memory.
Also for simplicity, the disk cache is append-only. If you're watching
really long livestreams, and need pruning, you're probably out of luck.
This still could be improved by trying to free unused blocks with
fallocate(), but since we're writing multiple streams in an interleaved
manner, this is slightly hard.
Some rather gross ugliness in packet.h: we want to store the file
position of the cached data somewhere, but on 32 bit architectures, we
don't have any usable 64 bit members for this, just the buf/len fields,
which add up to 64 bit - so the shitty union aliases this memory.
Error paths untested. Side data (the complicated part of trying to
serialize ffmpeg packets) untested.
Stream recording had to be adjusted. Some minor details change due to
this, but probably nothing important.
The change in attempt_range_joining() is because packets in cache
have no valid len field. It was a useful check (heuristically
finding broken cases), but not a necessary one.
Various other approaches were tried. It would be interesting to list
them and to mention the pros and cons, but I don't feel like it.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backwards demuxing usually seeks back back by a "random" amount (set by
a user option) when it needs new preceding packets. It turns out a past
change made these backwards seek amounts add up when it didn't need to
(i.e. subtracting the amount from the seek pos without properly
resetting it), which could possibly slow down playback as it went on.
The reason for this was that back_seek_pos was set for every stream on
every seek. This made the reset not affect other streams (in particular
streams which weren't used and never were reset, or which didn't reset
that often). But as the commit adding it showed, this is needed only to
set the initial position. So do that.
Fixes: "demux: fix initial backward demuxing state in some cases"
|
|
|
|
|
|
|
|
|
| |
When packet appending sets the start of the range, it adjusts the range
by seek_preroll. Do this when packets are pruned from the start of the
range too.
(Yeah, seek_preroll handling is probably broken in some other cases. It
was halfhearted to begin with.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main thing this commit does is removing demux_packet.kf_seek_pts. It
gets rid of 8 bytes per packet. Which doesn't matter, but whatever.
This field was involved with much of seek range updating and pruning,
because it tracked the canonical seek PTS (i.e. start PTS) of a packet
range. We have to deal with timestamp reordering, and assume the start
PTS is the lowest PTS across all packets (not necessarily just the first
packet). So knowing this PTS requires looping over all packets of a
range (no, the demuxer isn't going to tell us, that would be too sane).
Having this as packet field was perfectly fine. I'm just removing it
because I started hating extra packet fields recently.
Before this commit, this value was cached in the kf_seek_pts field (and
computed "incrementally" when adding packets). This commit computes the
value on demand (compute_keyframe_times()) by iterating over the placket
list. There is some similarity with the state before 10d0963d851fa,
where I introduced the kf_seek_pts field - maybe I'm just moving in
circles. The commit message claims something about quadratic complexity,
but if the code before that had this problem, this new commit doesn't
reintroduce it, at least. (See below.)
The pruning logic is simplified (I think?) - there is no "incremental"
cached pruning decision anymore (next_prune_target is removed), and
instead it simply prunes until the next keyframe like it's supposed to.
I think this incremental stuff was only there because of very old code
that got refactored away before. I don't even know what I was thinking
there, it just seems complex. Now the seek range is updated when a
keyframe packet is removed.
Instead of using the kf_seek_pts field, queue->seek_start is used to
determine the stream with the lowest timestamp, which should be pruned
first. This is different, but should work well. Doing the same as the
previous code would require compute_keyframe_times(), which would
introduce quadratic complexity.
On the other hand, it's fine to call compute_keyframe_times() when the
seek range is recomputed on pruning, because this is called only once
per removed keyframe packet. Effectively, this will iterate over the
packet list twice instead of once, and with some locality. The same
happens when packets are appended - it loops over the recently added
packets once again. (And not more often, which would go above linear
complexity.)
This introduces some "cleverness" with avoiding calling
update_seek_ranges() even when keyframe packets added/removed, which is
not really tightly coupled to the new code, and could have been in a
separate commit.
Removing next_prune_target achieves the same as commit b275232141f56,
which is hereby reverted (stale is_bof flags prevent seeking before the
current range, even if the beginning of the file was pruned). The seek
range is now strictly computed after at least one packet was removed,
and stale state should not be possible anymore.
Range joining may over-allocate the index a little. It tried hard to
avoid this before by explicitly freeing the old index before creating a
new one. Now it iterates over the old index while adding the entries to
the new one, which is simpler, but may allocate twice the memory in the
worst case. It's not going to matter for anything, though.
Seeking will be slightly slower. It needs to compute the seek PTS values
across all packets in the vicinity of the seek target. The previous code
also iterated over these packets, but now it iterates one packet range
more.
Another minor detail is that the special seeking code for SEEK_FORWARD
goes away. The seeking code will now iterate over the very last packet
range too, even if it's incomplete (i.e. packets are still being
appended to it). It's fine that it touches the incomplete range, because
the seek_end fields prevent that anything particularly incorrect can
happen. On the other hand, SEEK_FORWARD can now consider this as seek
target, which the deleted code had to do explicitly, as kf_seek_pts was
unset for incomplete packet ranges.
|
|
|
|
|
|
|
|
|
| |
Obviously doesn't sense with this order. The git history shows that this
comment was touched multiple times, without ever fixing it. It was
originally added in 2016, where the "for" was missing. Later, the "for"
was added, but to the wrong position.
What the fuck?
|
|
|
|
|
| |
Just for readability purposes. Although the field is mutable, it never
changes within the function.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old implementation didn't work for the OGG case. Discard the old
shit code (instead of fixing it), and write new shit code. The old code
was already over a year old, so it's about time to rewrite it for no
reason anyway.
While it's true that the old code appears to be broken, the main reason
to rewrite this is to make it simpler. While the amount of code seems to
be about the same, both the concept and the actual tag handling are
simpler. The result is probably a bit more correct.
The packet struct shrinks by 8 byte. That fact that it wasted 8 bytes
per packet for a rather obscure use case was the reason I started this
at all (and when I found that OGG updates didn't work). While these 8
bytes aren't going to hurt, the packet struct was getting too bloated.
If you buffer a lot of data, these extra fields will add up. Still quite
some effort for 8 bytes. Fortunately, it's not like there are any
managers that need to be convinced whether it's worth doing. The freedom
to waste time on dumb shit.
The old implementation attached the current metadata to each packet.
When the decoder read the packet, the packet's metadata was made
current. The new implementation stores metadata as separate list, and
requires that the player frontend tells it the current playback time,
which will be used to find the currently valid metadata. In both cases,
the objective was to correctly update metadata even if a lot of data is
buffered ahead (and to update them correctly when seeking within the
demuxer cache).
The new implementation is actually slightly more correct, because it
uses the playback time for the metadata lookup. Consider if you have an
audio filter which buffers 15 seconds (unfortunately such a filter
exists), then the old code would update the current title 15 seconds too
early, while the new one does it correctly.
The new code also simplifies mixing the 3 metadata sources (global, per
stream, ICY). We assume these aren't mixed in a meaningful way. The old
code tried to be a bit more "exact". I didn't bother to look how the old
code did this, but the new code simply always "merges" with the previous
metadata, so if a newer tag removes a field, it's going to stick around
anyway.
I tried to keep it simple. Other approaches include making metadata a
special sh_stream with metadata packets. This would have been
conceptually clean, but the implementation would probably have been
unnatural (and doesn't match well with libavformat's API anyway). It
would have been nice to make the metadata updates chapter points (makes
a lot of sense for the intended use case, web radio current song
information), but I don't think it would have been a good idea to make
chapters suddenly so dynamic. (Still an idea to keep in mind; the new
code actually makes it easier to work towards this.)
You could mention how subtitles are timed metadata, and actually are
implemented as sparse packet streams in some formats. mp4 implements
chapters as special subtitle stream, AFAIK. (Ironically, this is very
not-ideal for files. It would be useful for streaming like web radio,
but mp4 is extremely bad for streaming by design for other reasons.)
bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla bla
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the duplicated creation of the first range. Explicitly destroy
ranges, including the last one on final deinit.
It looks like this also fixes a leak of removed range structs, which was
never noticed because they're so small, and were freed on final deinit
due to having the demuxer as talloc parent.
This improves upon the previous commit too (that change should have
been part of it I guess). Sub-demuxers (demux_timeline only) now
automatically don't use the cache (like it was intended by the previous
commit). The cache is "initialized" (or disabled) last in the recursive
call chain, which is messy, but this sub demuxer stuff FUCKING SUCKS, as
mentioned in the previous commit message. This would be no problem if
the caching layer and actual demuxer implementations were separate.
Most of this change has no purpose. Might make (de-)initialization of
further cache exerpiments simpler.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It seems the so called demuxer cache wasn't really disabled for
sub-demuxers (timeline stuff). This was relatively harmless, since the
actual packet data was shared anyway via refcounting. But with the
addition of a mmap cache backend, this may change a lot.
So strictly disable any caching for sub-demuxers. This assumes that
users of sub-demuxers (only demux_timeline.c by now?) strictly use
demux_read_any_packet(), since demux_read_packet_async() will require
some minor read-ahead if a low level packet read returned a packet for a
different stream.
This requires some awkward messing with this fucking heap of trash. The
thing that is really wrong here is that the demuxer API mixes different
concepts, and sub-demuxers get the same API as decoders, and use the
cache code.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The demuxer cache tries to track the number of bytes allocated for the
cache. In addition to the packet queue, the seek index is another data
structure that roughly depends on the amount of packets cached. So the
index size should somehow be part of the total number of bytes tracking.
Until now, this was handled with KF_SEEK_ENTRY_WORST_CASE, basically a
shitty heuristic. It was a guess (and probably rather an upper bound
than a lower bound). The implementation details made it annoying, and it
was conceptually inaccurate too.
Change this, and instead simply add the index size to the total cache
size. This essentially makes it part of the backbuffer. It's nice that
this cleanly decouples it from the packet size tracking itself.
Since it's part of the backbuffer number of bytes now, packet pruning
can't necessarily free enough space in the backbuffer anymore. Before
this commit, the backbuffer consisted of packets only, so it was
possible to reduce its size to 0 by pruning all packets until the
decoder reader position, at which point a packet was accounted as
forward buffered. Now the index is added to this, and it can't be
pruned. Replace the assert() because of this changed invariant.
|
|
|
|
|
| |
Why not. struct demux_packet doesn't change on 64 bit size due to
alignment padding.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Someone who rams a knife into his own hand just to see what happens is
normally put in a psychiatric ward. But in software, this is acceptable
behavior. Programs are not supposed to crash just because a user did
something unreasonably dumb.
Switching tracks during backward playback is such a thing. It triggered
an assertion because the newly enabled stream was not properly
initialized for backward playback. Fix this, and make it actually work
(mostly; it still takes a "while" until playback recovers fully).
This actually makes some aspects of initialization slightly cleaner.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Not sure if this is bug-free. You _always_ make bugs when writing a
binary search from scratch (and such is the curse of C, though if I did
this in C++ it would probably end in blood). It seems to work though,
checking against the normal linear search.
It's slightly faster. Not much.
I wonder if the termination condition can be written in a nicer/elegant
way. I guess the fact that it's not a == predicate makes this slightly
messier?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The purpose of the seek index is to avoid having to iterate over the
full linked list of cached packets, which should speed up seeking. Until
now, there was an excuse of a seek index, that didn't really work.
The idea of the old index was nice: have a fixed number of entries (no
need to worry about exceeding memory requirements), which are
"stretched" out as the cache gets bigger. The size of it was 16 entries,
which in theory should speed up seeking by the factor 16, given evenly
spaced out entries. To achieve this even spacing, it attempted to "thin
out" the index by half once the index was full (see e.g. index_distance
field). In my observations this didn't really work, and the distribution
of the index entries was very uneven. Effectively, this did nothing. It
probably worked once and I can't be assed to debug my own shit code.
Writing new shit code is more fun.
Write new shit code for fun. This time it's a complete index. It's kept
in a ringbuffer (for easier LIFO style appending/removing), which is
resized with realloc if it becomes too small.
Actually, the index is not completely completely; it's still "thinned
out" by a hardcoded time value (INDEX_STEP_SIZE). This should help with
things like audio or crazy subtitle tracks (do they still create
those?), where we can just iterate over a small part of the linked
packet list to get the exact seek position. For example, for AAC audio
tracks with typical samplerates/framesizes we'd iterate about 50 packets
in the linked list.
The results are good. Seeking in large caches is much faster now,
apparently at least 1 or 2 orders of magnitude. Part of this is because
we don't need to touch every damn packet in a huge linked list (bad
cache behavior - the index is a linear memory region instead), but
"thinning" out the search space also helps. Both aspects can be easily
tested (setting INDEX_STEP_SIZE to 0, and replacing e->pts with
e->pkt->kf_seek_pts in find_seek_target()).
This does use more memory of course. In theory, we could tolerate memory
allocation failures (the index is optional and only for performance),
but I didn't bother and inserted an apologetic comment instead, have fun
with the shit code). the memory usage doesn't seem to be that bad,
though. Due to INDEX_STEP_SIZE it's bounded by the file duration too.
Try to account for the additional memory usage with an approximation
(see KF_SEEK_ENTRY_WORST_CASE). It's still a bit different, because the
index needs a single, potentially large allocation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The search was slightly more complicated and slow than it had to be. It
didn't assume that the packet list was sorted, which is responsible for
much of this. (I think the search code was borrowed from demux_mkv.c,
which does not sort index entries.)
There was a half-hearted attempt to make it exit early, but it was
mostly ineffective.
Simplify the code based on the assumption that the list is sorted. This
will exit the search loop once the worst case candidate entry was
checked.
|
|
|
|
|
|
|
| |
Mostly about the packet queue and the subtitle handling of it.
(This mess sure sounds like a good argument to give up the separate
stream queues, and using a single packet queue per cached range.)
|
|
|
|
| |
This message would always show "correct_dts=0 correct_pos=0".
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now, this usually passed a single audio frame to the decoder, and
then did a backstep operation (cache seek + frame search) again. This is
probably not very efficient, especially considering it has to search the
packet queue from the "start" every time again.
Also, with most audio codecs, an additional "preroll" frame was passed
first. In these cases, the preroll frame would make up 50% of audio
decoding time. Also not very efficient.
Attempt to fix this by returning multiple frames at once. This reduces
the number of backstep operations and the ratio the preoll frames. In
theory, this should help efficiency. I didn't test it though, why would
I do this? It's just a pain. Set it to unscientific 10 frames.
(Actually, these are 10 keyframes, so it's much more for codecs like
TrueHD. But I don't care about TrueHD.)
This commit changes some other implementation details. Since we can
return more than 1 non-preroll keyframe to the decoder, some new state
is needed to remember how much. The resume packet search is adjusted to
find N ("total") keyframe packets in general, not just preroll frames.
I'm removing the special case for 1 preroll packet; audio used this, but
doesn't anymore, and it's premature optimization anyway.
Expose the new mechanism with 2 new options. They're almost completely
pointless, since nobody will try them, and if they do, they won't
understand what these options truly do. And if they actually do, they
most likely would be capable of editing the source code, and we could
just hardcode the parameters. Just so you know that I know that the
added options are pointless.
The following two things are truly unrelated to this commit, and more
like general refactoring, but fortunately nobody can stop me.
Don't set back_seek_pos in dequeue_packet() anymore. This was sort of
pointless, since it was set in find_backward_restart_pos() anyway (using
some of the same packets). The latter function tries to restrict this to
the first keyframe range though, which is an optimization that in theory
might break with broken files (duh), but in these cases a lot of other
things would be broken anyway.
Don't set back_restart_* in dequeue_packet(). I think this is an
artifact of the old restart code (cf. ad9e473c555). It can be done
directly in find_backward_restart_pos() now. Although this adds another
shitty packet search loop, I prefer this, because clearer what's
actually happening.
|
|
|
|
| |
May as well be part of the previous commit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The size of all forward buffered packets is used to control maximum
buffering.
Until now, this size was incrementally adjusted, but had to be
recomputed on seeks within the cache. Doing this was actually pretty
expensive. It iterates over a linked list of separate memory allocations
(which are probably spread all over the heap due to the allocation
behavior), and the demux_packet_estimate_total_size() call touches a lot
of further memory locations. I guess this affects the cache rather
negatively. In an unscientific test, the recompute_buffers() function
(which contained this loop) was |