| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Why is everything so horrible.
|
|
|
|
|
|
|
|
| |
This uses the normal autoprobing rules like "auto", but rejects anything
that isn't flagged as copying data back to system memory.
The chunk in command.c was dead code, so remove it instead of updating
it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main change is with video/hwdec.h. mp_hwdec_info is made opaque (and
renamed to mp_hwdec_devices). Its accessors are mainly thread-safe (or
documented where not), which makes the whole thing saner and cleaner. In
particular, thread-safety rules become less subtle and more obvious.
The new internal API makes it easier to support multiple OpenGL interop
backends. (Although this is not done yet, and it's not clear whether it
ever will.)
This also removes all the API-specific fields from mp_hwdec_ctx and
replaces them with a "ctx" field. For d3d in particular, we drop the
mp_d3d_ctx struct completely, and pass the interfaces directly.
Remove the emulation checks from vaapi.c and vdpau.c; they are
pointless, and the checks that matter are done on the VO layer.
The d3d hardware decoders might slightly change behavior: dxva2-copy
will not use the VO device anymore if the VO supports proper interop.
This pretty much assumes that any in such cases the VO will not use any
form of exclusive mode, which makes using the VO device in copy mode
unnecessary.
This is a big refactor. Some things may be untested and could be broken.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now, we have made the assumption that a driver will use only 1
hardware surface format. the format is dictated by the driver (you
don't create surfaces with a specific format - you just pass a
rt_format and get a surface that will be in a specific driver-chosen
format).
In particular, the renderer created a dummy surface to probe the format,
and hoped the decoder would produce the same format. Due to a driver
bug this required a workaround to actually get the same format as the
driver did.
Change this so that the format is determined in the decoder. The format
is then passed down as hw_subfmt, which allows the renderer to configure
itself with the correct format. If the hardware surface changes its
format midstream, the renderer can be reconfigured using the normal
mechanisms.
This calls va_surface_init_subformat() each time after the decoder
returns a surface. Since libavcodec/AVFrame has no concept of sub-
formats, this is unavoidable. It creates and destroys a derived
VAImage, but this shouldn't have any bad performance effects (at
least I didn't notice any measurable effects).
Note that vaDeriveImage() failures are silently ignored as some
drivers (the vdpau wrapper) support neither vaDeriveImage, nor EGL
interop. In addition, we still probe whether we can map an image
in the EGL interop code. This is important as it's the only way
to determine whether EGL interop is supported at all. With respect
to the driver bug mentioned above, it doesn't matter which format
the test surface has.
In vf_vavpp, also remove the rt_format guessing business. I think the
existing logic was a bit meaningless anyway. It's not even a given
that vavpp produces the same rt_format for output.
|
|
|
|
|
|
|
|
|
|
| |
Until now, the presence of the process_image() callback was used to set
a delay queue with a hardcoded size. Change this to a vd_lavc_hwdec
field instead, so the decoder can explicitly set this if it's really
needed.
Do this so process_image() can be used in the VideoToolbox glue code for
something entirely unrelated.
|
|
|
|
|
|
|
|
|
|
| |
Some functions which expected a codec name (i.e. the name of the video
format itself) were passed a decoder name. Most "native" libavcodec
decoders have the same name as the codec, so this was never an issue.
This should mean that e.g. using "--vd=lavc:h264_mmal --hwdec=mmal"
should now actually enable native surface mode (instead of doing copy-
back).
|
|
|
|
|
|
|
| |
Commit b53cb8de increased this by the number of additionally delayed
surfaces. But since this is only enabled in copy-back mode (which is
what process_image is about), the other additional surfaces accounted
for the direct rendering case can be ignored.
|
|
|
|
| |
Facilitates hardware pipelining in particular with nvidia/dxva.
|
|
|
|
|
|
| |
They don't define FF_PROFILE_VP9_0.
Fixes #2737.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are at least 2 ways of using VAAPI without X11 (Wayland, DRM).
Remove the X11 requirement from the decoder part and the EGL interop.
This will be used by a following commit, which adds Wayland support.
The worst about this is the decoder part, which includes a bad hack for
using the decoder without any VO interop (also known as "vaapi-copy"
mode). Separate the X11 parts so that they're self-contained. For the
EGL interop code we do something similar (it's kept slightly simpler,
because it essentially only has to translate between our silly
MPGetNativeDisplay abstraction and the vaGetDisplay...() call).
|
|
|
|
|
|
|
| |
libavcodec does not support HEVC via VAAPI yet, so this won't work.
However, there is ongoing work to add HEVC support to VAAPI, and this
change might help with testing. (Or maybe not - but there is no harm in
this change.)
|
|
|
|
|
|
|
| |
All hwdec backends now use a single pixel format, and the format is
always checked.
Also, the init_decoder callback is now mandatory.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes problems with --vo=opengl:interpolation. The issue here is that
vo_opengl retains more surfaces than what was preallocated for the
decoder. Until now, we just explicitly failed to decode frames for which
no additional surfaces are available. Since modern drivers usually are
fine with not "registering" surfaces before the decoder is created, just
allow allocating additional surfaces if needed.
(We also could probably recreate the HW decoder, since the HW decoder
should be stateless. But let's try to avoid raising the overall
complexity of the code.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sometime recently, hardware decoding started to fail if h264 with full
reference frames was decoded, and --vo=vaapi was used. VAAPI requires
registering all surfaces that the decoder will ever use in advance, so
if the playback chain uses more surfaces than originally allocated, we
fail and drop back to software decoding.
I'm not really sure why or when this started happening. Commit 7b9d7265
for one is not the cause - it can be reproduced with earlier commits. It
also seems to be timing dependent. Possibly it has to do with the way
vo.c retains previous surfaces, and the way they can be queued/unqueued
asynchronously.
Increasing the number of reserved additional surfaces by 1 fixes it.
(Though I have no idea where exactly all these surfaces are being used.
Or rather, _when_.)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using --hwdec=auto, about half of all systems will print:
"[vdpau] Error when calling vdp_device_create_x11: 1"
this happens because usually mpv will be linked against both vdpau and
vaapi libs, but the drivers are not necessarily available. Then trying
to load a driver will fail. This is a normal part of probing, but the
error messages were printed anyway. Silence them by explicitly
distinguishing probing.
This pretty much goes through all the layers. We actually consider
loading hw backends for vo_opengl always "auto probed", even if a hw
backend is explicitly requested. In this case vd_lavc will print a
warning message anyway (adjust this message a bit).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This must have been some non-sense in the original vaapi mplayer patch.
While I still have no good idea what this "direct mapping" business is
about, it appears to be pretty much pointless. Nothing can hold
additional "real" surface references (due to how the API and mpv/lavc
refcounting work), so removing the additional surfaces won't break
anything. It still could be that this was for achieving additional
buffering (not reusing surfaces as soon), but we buffer some additional
data anyway. Plus, the original intention of the vaapi mplayer code was
probably increasing surface count just by 1 or 2, not actually doubling
it, and/or it was a "trick" to get to the maximum count of 21 when h264
is in use.
gstreamer-vaapi uses "ref_frames + SCRATCH_SURFACES_COUNT" here, with
SCRATCH_SURFACES_COUNT defined to 4. It doesn't appear to check the
overlay attributes at all in the decoder.
In any case, remove this non-sense.
|
|
|
|
|
| |
Maybe I don't know what I'm doing. I'm fairly certain though that Intel
does not know what they're doing.
|
|
|
|
|
|
|
|
|
|
|
| |
Before this commit, each hw backend had their own specific struct types
for context, and some, like VDA, had none at all. Add a context struct
(mp_hwdec_ctx) that provides a somewhat generic way to pass the hwdec
context around. Some things get slightly better, some slightly more
verbose.
mp_hwdec_info is still around; it's still needed, but is reduced to its
role of handling delayed loading of the hwdec backend.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
So talking to a certain Intel dev, it sounded like modern VA-API drivers
are reasonable thread-safe. But apparently that is not the case. Not at
all. So add approximate locking around all vaapi API calls.
The problem appeared once we moved decoding and display to different
threads. That means the "vaapi-copy" mode was unaffected, but decoding
with vo_vaapi or vo_opengl lead to random crashes.
Untested on real Intel hardware. With the vdpau emulation, it seems to
work fine - but actually it worked fine even before this commit, because
vdpau was written and designed not by morons, but competent people
(vdpau is guaranteed to be fully thread-safe).
There is some probability that this commit doesn't fix things entirely.
One problem is that locking might not be complete. For one, libavcodec
_also_ accesses vaapi, so we have to rely on our own guesses how and
when lavc uses vaapi (since we disable multithreading when doing hw
decoding, our guess should be relatively good, but it's still a lavc
implementation detail). One other reason that this commit might not
help is Intel's amazing potential to fuckup anything that is good and
holy.
|
|
|
|
|
|
|
| |
Playing with high framedrop could make it run out of surfaces. In
theory, we wouldn't need an additional surface, if we could just clear
the vo_vaapi internal surface - but doing so would probably be a pain,
so I don't care.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The VO is run inside its own thread. It also does most of video timing.
The playloop hands the image data and a realtime timestamp to the VO,
and the VO does the rest.
In particular, this allows the playloop to do other things, instead of
blocking for video redraw. But if anything accesses the VO during video
timing, it will block.
This also fixes vo_sdl.c event handling; but that is only a side-effect,
since reimplementing the broken way would require more effort.
Also drop --softsleep. In theory, this option helps if the kernel's
sleeping mechanism is too inaccurate for video timing. In practice, I
haven't ever encountered a situation where it helps, and it just burns
CPU cycles. On the other hand it's probably actively harmful, because
it prevents the libavcodec decoder threads from doing real work.
Side note:
Originally, I intended that multiple frames can be queued to the VO. But
this is not done, due to problems with OSD and other certain features.
OSD in particular is simply designed in a way that it can be neither
timed nor copied, so you do have to render it into the video frame
before you can draw the next frame. (Subtitles have no such restriction.
sd_lavc was even updated to fix this.) It seems the right solution to
queuing multiple VO frames is rendering on VO-backed framebuffers, like
vo_vdpau.c does. This requires VO driver support, and is out of scope
of this commit.
As consequence, the VO has a queue size of 1. The existing video queue
is just needed to compute frame duration, and will be moved out in the
next commit.
|
|
|
|
|
|
| |
Found with valgrind. This is somewhat terrifying, because the VA-API API
function is supposed to fill these values, and we access them only if
the API functions return success. So this shouldn't have happened.
|
|
|
|
|
|
|
| |
This is incomplete; the video chain will still hold some vaapi objects
after destroying the decoder and thus the vaapi context. This is very
bad. Fixing it would require something like refcounting the vaapi
context, but I don't really want to.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mpv supports two hardware decoding APIs on Linux: vdpau and vaapi. Each
of these has emulation wrappers. The wrappers are usually slower and
have fewer features than their native opposites. In particular the libva
vdpau driver is practically unmaintained.
Check the vendor string and print a warning if emulation is detected.
Checking vendor strings is a very stupid thing to do, but I find the
thought of people using an emulated API for no reason worse.
Also, make --hwdec=auto never use an API that is detected as emulated.
This doesn't work quite right yet, because once one API is loaded,
vo_opengl doesn't unload it, so no hardware decoding will be used if the
first probed API (usually vdpau) is rejected. But good enough.
|
|
|
|
|
|
| |
It's not really needed to be public. Other code can just use mp_image.
The only disadvantage is that the other code needs to call an accessor
to get the VASurfaceID.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Although I at first thought it would be better to have a separate
implementation for hwaccels because the difference to software images
are too large, it turns out you can actually save some code with it.
Note that the old implementation had a small memory management bug. This
got painted over in commit 269c1e1, but is hereby solved properly.
Also note that I couldn't test vf_vavpp.c (due to lack of hardware), and
I hope I didn't accidentally break it.
|
|
|
|
|
|
|
| |
All this code was needed for compatibility with very old libavcodec
versions only (such as Libav 9).
Includes some now-possible simplifications too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Apparently the "right" place to initialize the hardware decoder is in
the libavcodec get_format callback.
This doesn't change vda.c and vdpau_old.c, because I don't have OSX, and
vdpau_old.c is probably going to be removed soon (if Libav ever manages
to release Libav 10). So for now the init_decoder callback added with
this commit is optional.
This also means vdpau.c and vaapi.c don't have to manage and check the
image parameters anymore.
This change is probably needed for when libavcodec VDA supports gets a
new iteration of its API.
|
|
|
|
|
|
| |
This ended up a little bit messy. In order to get a mp_log everywhere,
mostly make use of the fact that va_surface already references global
state anyway.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
PIX_FMT_VDA_VLD and PIX_FMT_VAAPI_VLD were never used anywhere. I'm not
sure why they were even added, and they sound like they are just for
compatibility with XvMC-style decoding, which sucks anyway.
Now that there's only a single vaapi format, remove the
IMGFMT_IS_VAAPI() macro. Also get rid of IMGFMT_IS_VDA(), which was
unused.
|
|
|
|
|
|
|
|
| |
This means most code accessing this struct must now include hwdec.h
instead of dec_video.h. I just put it into dec_video.h at first because
I thought a separate file would be a waste, but it's more proper to do
it this way, as there are too many files which include dec_video.h only
to get the mp_hwdec_info definition.
|
|
|
|
|
|
|
|
| |
VA-API's OpenGL/GLX interop is pretty bad and perhaps slow (renders a
X11 pixmap into a FBO, and has to go over X11, probably involves one or
more copies), and this code serves more as an example, rather than for
serious use. On the other hand, this might be work much better than
vo_vaapi, even if slightly slower.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had some code for checking profiles earlier, which was removed in
commits 2508f38 and adfb71b. These commits mentioned that (working) hw
decoding was sometimes prevented due to profile checking, but I can't
find the samples anymore that showed this behavior. Also, I changed my
opinion, and I think checking the profiles is something that should be
done for better fallback to software decoding behavior.
The checks roughly follow VLC's vdpau profile checks, although we do
not check codec levels. (VLC's profile checks aren't necessarily
completely correct, but they're a welcome help anyway.)
Add a --vd-lavc-check-hw-profile option, which skips the profile check.
|
|
|
|
|
| |
These probably don't work. libavcodec doesn't seem to support them,
and neither did the original mplayer-vaapi patch.
|
|
|
|
| |
Attempting signed comparison on unsigned value.
|
|
|
|
|
|
|
|
| |
Don't allocate a VAImage and a mp_image every time. VAImage are cached
in the surfaces themselves, and for mp_image an explicit pool is
created. The retry loop runs only once for each surface now.
This also makes use of vaDeriveImage() if possible.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This code is actually quite inefficient: it reuses the (slow, simple)
screenshot code. It uses an inefficient method to read the image
(vaGetImage() instead of vaDeriveImage()), allocates new memory for
each frame that is read, and it tries all image formats again each
time.
Also, in my tests it always picked NV12 as image format, which is not
ideal if you actually want to filter the video, and vo_xv can't handle
this format without conversion either.
However, a user confirmed that it worked for him, so everything is fine.
|
|
|
|
|
|
|
|
| |
Merged from pull request #246 by xylosper. Minor cosmetic changes, some
adjustments (compatibility with older libva versions), and manpage
additions by wm4.
Signed-off-by: wm4 <wm4@nowhere>
|
|
|
|
|
|
|
| |
Now the code does the same as the original MPlayer VAAPI patch, instead
of trying to map the profiles exactly.
See previous commit for justification and discussion.
|
|
|
|
|
|
| |
Instead of passing AVFrame. This also moves the mysterious logic about
the size of the allocated image to common code, instead of duplicating
it everywhere.
|
|
This is based on the MPlayer VA API patches. To be exact it's based on
a very stripped down version of commit f1ad459a263f8537f6c from
git://gitorious.org/vaapi/mplayer.git.
This doesn't contain useless things like benchmarking hacks and the
demo code for GLX interop. Also, unlike in the original patch, decoding
and video output are split into separate source files (the separation
between decoding and display also makes pixel format hacks unnecessary).
On the other hand, some features not present in the original patch were
added, like screenshot support.
VA API is rather bad for actual video output. Dealing with older libva
versions or the completely broken vdpau backend doesn't help. OSD is
low quality and should be rather slow. In some cases, only either OSD
or subtitles can be shown at the same time (because OSD is drawn first,
OSD is prefered).
Also, libva can't decide whether it accepts straight or premultiplied
alpha for OSD sub-pictures: the vdpau backend seems to assume
premultiplied, while a native vaapi driver uses straight. So I picked
straight alpha. It doesn't matter much, because the blending code for
straight alpha I added to img_convert.c is probably buggy, and ASS
subtitles might be blended incorrectly.
Really good video output with VA API would probably use OpenGL and the
GL interop features, but at this point you might just use vo_opengl.
(Patches for making HW decoding with vo_opengl have a chance of being
accepted.)
Despite these issues, decoding seems to work ok. I still got tearing
on the Intel system I tested (Intel(R) Core(TM) i3-2350M). It was also
tested with the vdpau vaapi wrapper on a nvidia system; however this
was rather broken. (Fortunately, there is no reason to use mpv's VAAPI
support over native VDPAU.)
|