summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* vo_gpu: never pass flipped images to ra or ra backendswm42017-11-101-2/+7
| | | | | | | | | | | | | | | | | | | Seems like the last refactor to this code broke playing flipped images, at least with --opengl-pbo --gpu-api=opengl. Flipping is rather a shitmess. The main problem is that OpenGL does not support flipped uploading. The original vo_gl implementation considered it important to handle the flipped case efficiently, so instead of uploading the image line by line backwards, it uploaded it flipped, and then flipped it in the renderer (basically the upload path ignored the flipping). The ra code and backends probably have an insane and inconsistent mix of semantics, so fix this by never passing it flipped images in the first place. In the future, the backends should probably support flipped images directly. Fixes #5097.
* demux: attempt to accurately reflect seek range with muxed subtitleswm42017-11-101-5/+33
| | | | | | | | | | | | | | | | If subtitles are part of the stream, determining the seekable range becomes harder. Subtitles are sparse, and can have packets in irregular intervals, or even completely lack packets. The usual logic of computing the seek range by the min/max packet timestamps fails. Solve this by making the only assumption we can make: subtitle packets are implicitly demuxed along with other packets. We also assume perfect interleaving for this, but you really can't do anything with sparse packets that makes sense without this assumption. One special case is if we prune sparse packets within the current seekable range. Obviously this should limit the seekable range to after the pruned packet.
* demux: reduce indentation for two functionswm42017-11-101-37/+36
| | | | | Remove the single-exit, that added a huge if statement containing everything, just for some corner case.
* demux: some minor mostly cosmeticswm42017-11-101-13/+15
| | | | | None of these change functionality in any way (although the log level changes obviously change the terminal output).
* demux: simplify a functionwm42017-11-101-21/+19
| | | | | update_stream_selection_state() doesn't need all these arguments. Not sure what I was thinking here.
* demux: change how refreshes on track switching are handledwm42017-11-101-66/+59
| | | | | | Instead of weirdly deciding this on every packet read and having the code far away from where it's actually needed, just run it where it's actually needed.
* demux: get rid of weird backwards loopwm42017-11-101-1/+1
| | | | | | A typical idiom for calling functions that remove something from the array being iterated, but it's not needed here. I have no idea why this was ever done.
* demux: avoid broken readahead when joining rangeswm42017-11-101-4/+5
| | | | | | | | | | Setting ds->refreshing for unselected streams could lead to a nonsensical queue overflow warning, because read_packet() took it as a reason to continue reading. Also add some more information to the queue overflow warning (even if that one doesn't have anything to do with this bug - it was for unselected streams only).
* demux: reduce difference between threaded and non-threaded modewm42017-11-101-27/+35
| | | | | | | | | | | This fixes an endless loop with threading disabled, such as for example when playing a file with an external subtitle file, and seeking to the beginning. Something will set in->seeking, but the seek is never executed, which made demux_read_packet() loop endlessly. (External subtitles will use non-threaded mode for whatever reasons.) Fix this by by making the unthreaded code to execute the worker thread body, which reduces the difference in logic.
* demux: support multiple seekable cached rangeswm42017-11-094-230/+610
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, the demuxer cache was limited to a single range. Extend this to multiple range. Should be useful for slow network streams. This commit changes a lot in the internal demuxer cache logic, so there's a lot of room for bugs and regressions. The logic without demuxer cache is mostly untouched, but also involved with the code changes. Or in other words, this commit probably fucks up shit. There are two things which makes multiple cached ranges rather hard: 1. the need to resume the demuxer at the end of a cached range when seeking to it 2. joining two adjacent ranges when the lowe range "grows" into it (and resuming the demuxer at the end of the new joined range) "Resuming" the demuxer means that we perform a low level seek to the end of a cached range, and properly append new packets to it, without adding packets multiple times or creating holes due to missing packets. Since audio and video never line up exactly, there is no clean "cut" possible, at which you could resume the demuxer cleanly (for 1.) or which you could use to detect that two ranges are perfectly adjacent (for 2.). The way how the demuxer interleaves multiple streams is also unpredictable. Typically you will have to expect that it randomly allows one of the streams to be ahead by a bit, and so on. To deal with this, we have heuristics in place to detect when one packet equals or is "behind" a packet that was demuxed earlier. We reuse the refresh seek logic (used to "reread" packets into the demuxer cache when enabling a track), which checks for certain packet invariants. Currently, it observes whether either the raw packet position, or the packet DTS is strictly monotonically increasing. If none of them are true, we discard old ranges when creating a new one. This heavily depends on the file format and the demuxer behavior. For example, not all file formats have DTS, and the packet position can be unset due to libavformat not always setting it (e.g. when parsers are used). At the same time, we must deal with all the complicated state used to track prefetching and seek ranges. In some complicated corner cases, we just give up and discard other seek ranges, even if the previously mentioned packet invariants are fulfilled. To handle joining, we're being particularly dumb, and require a small overlap to be confident that two ranges join perfectly. (This could be done incrementally with as little overlap as 1 packet, but corner cases would eat us: each stream needs to be joined separately, and the cache pruning logic could remove overlapping packets for other streams again.) Another restriction is that switching the cached range will always trigger an asynchronous low level seek to resume demuxing at the new range. Some users might find this annoying. Dealing with interleaved subtitles is not fully handled yet. It will clamp the seekable range to where subtitle packets are.
* appveyor: update ffmpeg and test d3d11/vulkanJames Ross-Gowan2017-11-083-16/+59
| | | | | | | Build ffmpeg-mpv, shaderc and crossc from source, since they are not packaged in MSYS2. Also, add some more explicit --enable flags to the mpv build to make sure things like D3D11, D3D11VA hwaccels and Vulkan are auto-detected.
* vo_gpu: hwdec_d3d11va: allow zero-copy video decodingJames Ross-Gowan2017-11-076-62/+176
| | | | | | | | | | | | | | Like the manual says, this is technically undefined behaviour. See: https://msdn.microsoft.com/en-us/library/windows/desktop/ff476085.aspx In particular, MSDN says texture arrays created with the BIND_DECODER flag cannot be used with CreateShaderResourceView, which means they can't be sampled through SRVs like normal Direct3D textures. However, some programs (Google Chrome included) do this anyway for performance and power-usage reasons, and it appears to work with most drivers. Older AMD drivers had a "bug" with zero-copy decoding, but this appears to have been fixed. See #3255, #3464 and http://crbug.com/623029.
* vo_gpu: d3d11: enhance cache invalidationJames Ross-Gowan2017-11-072-5/+71
| | | | | | | | The shader cache in ra_d3d11 caches the result of shaderc, crossc and the D3DCompiler DLL, so it should be invalidated when any of those components are updated. This should make the cache more reliable, which makes it safer to enable gpu-shader-cache-dir. Shader compilation is slow with D3D11, so gpu-shader-cache-dir is highly necessary
* vo_gpu: d3d11: log shader compilation timesJames Ross-Gowan2017-11-071-0/+26
| | | | | | | Some shaders take a _long_ time to compile with the Direct3D compiler. The ANGLE backend had this problem too, to a certain extent. Logging should help identify which shaders cause long stalls and could also help with benchmarking ways of reducing compile times.
* vo_gpu: move d3d11_screenshot to shared codeJames Ross-Gowan2017-11-074-80/+100
| | | | This can be used by the ANGLE backend and ra_d3d11.
* vo_gpu: d3d11: add RA caps for ra_d3d11James Ross-Gowan2017-11-077-6/+35
| | | | | | | | | | | | | | | | ra_d3d11 uses the SPIR-V compiler to translate GLSL to SPIR-V, which is then translated to HLSL. This means it always exposes the same GLSL version that the SPIR-V compiler supports (4.50 for shaderc/glslang.) Despite claiming to support GLSL 4.50, some features that are tied to the GLSL version in OpenGL are not supported by ra_d3d11 when targeting legacy Direct3D feature levels. This includes two features that mpv relies on: - Reading from gl_FragCoord in the fragment shader (requires FL 10_0) - textureGather from any texture component (requires FL 11_0) These features have been exposed as new RA caps.
* vo_gpu: d3d11: initial implementationJames Ross-Gowan2017-11-0714-16/+2797
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a new RA/vo_gpu backend that uses Direct3D 11. The GLSL generated by vo_gpu is cross-compiled to HLSL with SPIRV-Cross. What works: - All of mpv's internal shaders should work, including compute shaders. - Some external shaders have been tested and work, including RAVU and adaptive-sharpen. - Non-dumb mode works, even on very old hardware. Most features work at feature level 9_3 and all features work at feature level 10_0. Some features also work at feature level 9_1 and 9_2, but without high-bit- depth FBOs, it's not very useful. (Hardware this old is probably not fast enough for advanced features anyway.) Note: This is more compatible than ANGLE, which requires 9_3 to work at all (GLES 2.0,) and 10_1 for non-dumb-mode (GLES 3.0.) - Hardware decoding with D3D11VA, including decoding of 10-bit formats without truncation to 8-bit. What doesn't work / can be improved: - PBO upload and direct rendering does not work yet. Direct rendering requires persistent-mapped PBOs because the decoder needs to be able to read data from images that have already been decoded and uploaded. Unfortunately, it seems like persistent-mapped PBOs are fundamentally incompatible with D3D11, which requires all resources to use driver- managed memory and requires memory to be unmapped (and hence pointers to be invalidated) when a resource is used in a draw or copy operation. However it might be possible to use D3D11's limited multithreading capabilities to emulate some features of PBOs, like asynchronous texture uploading. - The blit() and clear() operations don't have equivalents in the D3D11 API that handle all cases, so in most cases, they have to be emulated with a shader. This is currently done inside ra_d3d11, but ideally it would be done in generic code, so it can take advantage of mpv's shader generation utilities. - SPIRV-Cross is used through a NIH C-compatible wrapper library, since it does not expose a C interface itself. The library is available here: https://github.com/rossy/crossc - The D3D11 context could be made to support more modern DXGI features in future. For example, it should be possible to add support for high-bit-depth and HDR output with DXGI 1.5/1.6.
* vo_gpu: export the GLSL format qualifier for ra_formatJames Ross-Gowan2017-11-075-14/+77
| | | | | | | | | | | | | | | Backported from @haasn's change to libplacebo, except in the current RA, there's nothing to indicate an ra_format can be bound as a storage image, so there's no way to force all of these formats to have a glsl_format. Instead, the layout qualifier will be removed if glsl_format is NULL. This is needed for the upcoming ra_d3d11 backend. In Direct3D 11, while loading float values from unorm images often works as expected, it's technically undefined behaviour, and in Windows 10, it will cause the debug layer to spam the log with error messages. Also, apparently in GLSL, the format name must match the image's format exactly (but in Direct3D, it just has to have the same component type.)
* vo_gpu: add namespace query mechanismJames Ross-Gowan2017-11-074-10/+21
| | | | | | Backported from @haasn's change to libplacebo. More flexible than the previous "shared || non-shared" distinction. The extra flexibility is needed for Direct3D 11, but it also doesn't hurt code-wise.
* vo_lavc: remove messy delayed subtitle rendering logicwm42017-11-071-10/+8
| | | | | | | | | | | | | | | | | | | | | For some reason vo_lavc's draw_image can buffer the frame and encode it only later. Also, there is logic for rendering the OSD (i.e. subtitles) only when needed. In theory this can lead to subtitles being pruned before it tries to render them (as the subtitle logic doesn't know that the VO still needs them later), although this probably never happens in reality. The worse issue, that actually happened, is that if the last frame gets buffered, it attempts to render subtitles in the uninit callback. At this point, the subtitle decoder is already torn down and all subtitles removed, thus it will draw nothing. This didn't always happen. I'm not sure why - potentially in the working cases, the frame wasn't buffered. Since this logic doesn't have much worth, except a minor performance advantage if frames with subtitles are dropped, just remove it. Hopefully fixes #4689.
* player: change license of some code surrounding --frames to LGPLwm42017-11-063-14/+0
| | | | The original author of the patch has agreed now.
* img_format: remove some guards against old ffmpeg APIwm42017-11-061-5/+2
| | | | These are always present in ffmpeg-mpv, and never in Libav.
* demux: explicitly discard 0 sized packetswm42017-11-061-1/+1
| | | | | | | libavcodec can't deal with them, because its API doesn't distinguish between 0 sized packets and sending EOF. As such, keeping them doesn't do any good, ever. This actually fixes some obscure mkv sample (see previous commit).
* demux_mkv: allow 0 sized packetswm42017-11-061-1/+1
| | | | | | | | | | Fixes some obscure sample that uses fixed size laces with 0-sized lace size. Some broken shit. (Maybe the decoder wouldn't care about these packets, but the demuxer attempted to resync after these packet reading errors, even though they were perfectly recoverable. But I don't care enough about this.) Sample link: https://samples.ffmpeg.org/Matroska/switzler084d_dl.mkv
* demux: slightly simplify pruningwm42017-11-061-7/+2
| | | | | We can compute the overhead size easily now - no need for awkward incremental updates to cached values on top of it.
* demux_mkv: rewrite packet reading to avoid 1 memcpy()wm42017-11-055-136/+132
| | | | | | | | | | | | | | | | | | | | | | | This directly reads individual mkv sub-packets (block laces) into a dedicated AVBufferRefs, which can be directly used for creating packets without a additional copy of the packet data. This also means we switch parsing of block header fields and lacing metadata to read directly from the stream, instead of a memory buffer. This could have been much easier if libavcodec didn't require padding the packet data with zero bytes. We could just have each packet reference a slice of the block data. But as it is, the only way to get padding without a copy is to read the laces into individually allocated (and padded) memory block, which required a larger rewrite. This probably makes recovering from broken mkv files slightly worse if the transport is unseekable. We just read, and then check if we've overread. But I think that shouldn't be a real concern. No actual measureable performance change. Potential for some regressions, as this is quite intrusive, and touches weird obscure shit like mkv lacing. Still keeping it because I like how it removes some redundant EBML parsing functions.
* demux: refactoring in preparation for multiple seek range supportwm42017-11-041-83/+214
| | | | | | | | | | | This adds a bunch of stuff (mostly unused or redundant) as preparation for supporting multiple seek ranges. Actual support is probably still far away. One change that messes deeper with the actual code is that we account for total buffered bytes instead of just the backwards bytes now. This way, prune_old_packets() doesn't have to iterate over all seek ranges to determine whether something needs pruning.
* demux: improve and optimize cache pruning and seek range determinationwm42017-11-044-85/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | The main purpose of this commit is avoiding any hidden O(n^2) algorithms in the code for pruning the demuxer cache, and for determining the seekable boundaries of the cache. The old code could loop over the whole packet queue on every packet pruned in certain corner cases. There are two ways how to reach the goal: 1) commit a cardinal sin 2) do everything incrementally The cardinal sin is adding an extra field to demux_packet, which caches the determined seekable range for a keyframe range. demux_packet is a rather general data structure and thus shouldn't have any fields that are not inherent to its use, and are only needed as an implementation detail of code using it. But what are you gonna do, sue me? In the future, demux.c might have its own packet struct though. Then the other existing cardinal sin (the "next" field, from MPlayer times) could be removed as well. This commit also changes slightly how the seek end is determined. There is a note on the manpage in case anyone finds the new behavior confusing. It's somewhat cleaner and might be needed for supporting multiple ranges (although that's unclear).
* demux: reduce overhead when searching over keyframe rangeswm42017-11-041-31/+37
| | | | | | | | | | The demuxer cache seeking mechanism looks at keyframe ranges to determine the earlierst PTS of a packet. Instead of looping over all packets twice (once to find the next keyframe, a second time to find the seek PTS), do it in one go. For that basically turn recompute_keyframe_target_pts() into an iteration functionn. Functionality should be unchanged with this commit.
* player: log if NDEBUG is definedwm42017-11-041-0/+3
| | | | I sure want to know whether assert()s were unexpectedly not compiled in.
* demux: avoid excessive readahead after cache seekwm42017-11-041-1/+3
| | | | | | | | | | | | The base_ts field is used to guess the decoder position, and when set to NOPTS, it just read ahead arbitrarily. Also demux_add_packet() sets base_ts to the new timestamp when appending a packet, which would also make it readahead by a too large amount. Fix this by setting base_ts after a seek. This assumes that normally, a cached seek target will always have the timestamp set. This is actually not quite clear (as it calls recompute_keyframe_target_pts(), which looks at multiple packets), but maybe it works well enough.
* demux: make pruning more efficient for unseekable demuxer cachewm42017-11-041-19/+24
| | | | | | | | | Don't do any of the extra work related to pruning the backbuffer if demuxer cache seeking is disabled. As a small extra, always prune packets with no timestamps immediately, or queue heads that are not keyframes. (Both of them would be pruned from the backbuffer by the normal logic anyway.)
* demux: on queue overflow wake up reader thread on EOF onlywm42017-11-031-1/+1
| | | | This seems to make more sense.
* demux: don't show queue overflow warning when merely prefetchingwm42017-11-031-6/+7
| | | | | | | | | | | | | If fulfilling --demuxer-readahead-secs requires more memory than allowed by --demuxer-max-bytes, the queue obviously overflows. But the warning is normally only for the case when trying to find the next video or audio packet fails, and decoding can't continue. Use a separate variable for determining whether to prefetch, and if the queue has overflown, skip the message. In fact, skip the EOF setting and waking up the decoder thread as well, because the EOF flag should not be (have been) set in this situation, and waking up the reader thread helps only if the EOF state changed.
* vo_gpu: don't re-render hwdec frames when repeating frameswm42017-11-031-19/+11
| | | | | | | | | | | | | Repeating frames (for display-sync) is not supposed to render the entire frame again. When using hardware decoding, it unfortunately did: the renderer uses the frame ID to check whether the frame data changed, and unmapping the hwdec frame clears it. Essentially reverts commit 761eeacf5407cab07. Back then I probably thought it would be a good idea to release the hwdec image quickly in order to return it to the decoder, but they're referenced anyway. This should increase the performance and reduce GPU work.
* demux: don't allow subtitles to mess up buffered time displaywm42017-11-031-1/+2
| | | | | | | | | | | In a shit show of subtle corner case interactions, making the demuxer cache buffer the entire file can display a small buffered time if subtitles are enabled. The reason is that some subtitle decoders may read in advance infinitely, i.e. they read the entire subtitle stream. Then, since the other streams (audio/video) have logically reached EOF, and the subtitle stream is set to ds->active==true. This will have to be fixed properly later to account buffering for subtitle-only files (another corner case) correctly, but for now this is less annoying.
* osc: make cycling visibility an input.conf key bindingwm42017-11-032-1/+2
| | | | | As builtin script, it should not register global key bindings, and add them to input.conf instead. This is similar to what stats.lua does.
* demux: add option to create CC tracks eagerlywm42017-11-032-15/+58
| | | | | | | | | | | | We don't hope to auto-detect them at load time, as that would be too much of a pain - even FFmpeg requires fetching and parsing of video packets, and exposes the information only via deprecated API. But there still needs to be a way to select them by default. This is also needed to get the first CC packet at all (without seeking back). This commit also attempts to clean up locking a bit, which is a PITA, but it's better be careful & clean.
* vo_gpu: ra_gl: fix minimum GLSL version to 120wm42017-11-031-1/+1
| | | | Not sure why there was 110, or why there is even a default.
* vo_gpu: fix mobius tone mapping compatibility to GLSL 120wm42017-11-031-1/+1
| | | | | | | | | Normally such code is didsabled by have_mglsl==false in check_gl_features(), but apparently not this one. Just fix it. Seems also more readable. Fixes #5069.
* vo_gpu: ra_gl: fix crash trying to use glBindBufferBase on GL 2.1wm42017-11-031-2/+7
| | | | | | | Apparently this is required, but it doesn't check for it. To be fair, this was tested by creating a compatibility context and pretending it's GL 2.1. GL_ARB_shader_storage_buffer_object actually requires GL 4.0 or up, but GL_ARB_uniform_buffer_object requires only GL 2.0.
* wscript: use pkg-config for vulkan checkMartin Herkt2017-11-031-1/+1
|
* demux_mkv: add V_SNOW tag to mkv_video_tagsNicolas F2017-11-031-0/+1
| | | | | Apparently, and to nobody's surprise, mkv can contain snow. It does so with the V_SNOW tag. We can now play back snow-inside-mkv in mpv.
* osc: render seek ranges a bit less uglyChrisK22017-11-021-1/+1
|
* osc: render seek rangeswm42017-11-021-0/+26
| | | | | | Pretty fucking ugly, but I'm not a UI designer. Of course only does something with --demuxer-seekable-cache.
* stream_libarchive: stop reading on ARCHIVE_FATALwm42017-11-021-4/+41
| | | | | | | | | | | | | | | | According to https://github.com/libarchive/libarchive/pull/773#issuecomment-334892291 we're not allowed to "continue reading" (post above) or performing "more operations" (comments in archive.h header), whatever that means. Assume closing and freeing the archive is still ok. Since the codec already includes logic for closing and reopening the archive for seeking in unseekable archives, this probably isn't too bad. Untested due to lack of crashing sample (I lost my original test case, and as recently user-provided one didn't crash).
* screenshot: create directories from templateNicolas F2017-11-021-0/+5
| | | | | screenshot-template could be set to e.g. "%F/%04n", so we want to make sure that the path generated from the template actually exists.
* osd: don't skip leading whitespace on the first line eitherwm42017-11-021-1/+1
| | | | Stupid libass.
* vd_lavc: restore --hwdec-image-format and d3d11 opaque modewm42017-11-022-1/