Hendrik Leppkes [Fri, 23 Oct 2015 11:54:25 +0000 (13:54 +0200)]
avcodec: disallow hwaccel with frame threads
HWAccels with frame threads are fundamentally flawed in avcodecs current
design, and there are several known problems ranging from image corruption
to driver crashes.
These problems come down to two design problems in the interaction of
threads and HWAccel decoding:
(1)
While avcodec prevents parallel decoding and as such simultaneous access
to the hardware accelerator from the decoding threads, it cannot account
for the user code and its access to the hardware surfaces and the hardware
itself.
This can result in image corruption or even driver crashes if the
user code locks image surfaces while they are being used by the decoder
threads as reference frames.
The current HWAccel API does not offer any way to ensure exclusive access
to the hardware or the surfaces if frame threading is used.
(2)
Initialization of the HWAccel with frame threads is non-trivial, and many
decoders had and still have issues that cause excess calls to the
get_format callback.
This will potentially cause duplicate HWAccel initialization, which in
extreme cases can even lead to driver crashes if the HWAccel is
re-initialized while the user code is actively accessing the hardware
surfaces associated with it, or lead to image corruption due to lost
reference frames.
While both of these issues are solvable, fixing (1) would at least require
a huge API redesign which would move a lot of complexity into the user
code.
The only reason the combination of frame threads and HWAccel was
considered useful is to allow a seamless fallback to multi-threaded
software decoding if the HWAccel is not available, however the issues
outlined above far outweigh this.
The proper solution for a fallback is to re-open the AVCodecContext with
threading enabled if the HWAccel failed, which is a practice commonly used
by various user applications using avcodec today already.
Tinglin Liu [Thu, 22 Oct 2015 22:04:46 +0000 (15:04 -0700)]
mov: Add support parsing QuickTime Metadata Keys.
The Apple dev specification:
https://developer.apple.com/library/mac/documentation/QuickTime/QTFF/Metadata/Metadata.html
Basically the structure is like:
|--meta
|----hdlr
|----keys
|----ilst
1) The handler type in the metadata handler atom is ‘mdta’.
2) The key and value are stored separately for each key-value pair.
The 'keys' atom stores the key table, while 'ilst' atom stores the
values corresponding to the indices in the key table.
tests/fate/aac: Add bitexact flags to fate-aac-pns-encode
This fixes a fate failure after bumping the minor version
Its unknown why this is not needed for the other aac tests,
more investigation needed but for now i dont want to leave
it broken while its investigated
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
avfilter/vf_removegrain: replace qsort with AV_QSORT
filter_slice calls qsort, so qsort is in a performance critical
position. AV_QSORT is substantially faster due to the inlining of the
comparison callback. Thus, the increase in performance is worth the
increase in binary size.
Sample benchmark (x86-64, Haswell, GNU/Linux),
filter-removegrain-mode-02 (from FATE)
new:
24060 decicycles in qsort, 1 runs, 0 skips
15690 decicycles in qsort, 2 runs, 0 skips
9307 decicycles in qsort, 4 runs, 0 skips
5572 decicycles in qsort, 8 runs, 0 skips
3485 decicycles in qsort, 16 runs, 0 skips
2517 decicycles in qsort, 32 runs, 0 skips
1979 decicycles in qsort, 64 runs, 0 skips
1911 decicycles in qsort, 128 runs, 0 skips
1568 decicycles in qsort, 256 runs, 0 skips
1596 decicycles in qsort, 512 runs, 0 skips
1614 decicycles in qsort, 1024 runs, 0 skips
1874 decicycles in qsort, 2046 runs, 2 skips
2186 decicycles in qsort, 4094 runs, 2 skips
old:
246960 decicycles in qsort, 1 runs, 0 skips
135765 decicycles in qsort, 2 runs, 0 skips
70920 decicycles in qsort, 4 runs, 0 skips
37710 decicycles in qsort, 8 runs, 0 skips
20831 decicycles in qsort, 16 runs, 0 skips
12225 decicycles in qsort, 32 runs, 0 skips
8083 decicycles in qsort, 64 runs, 0 skips
6270 decicycles in qsort, 128 runs, 0 skips
5321 decicycles in qsort, 256 runs, 0 skips
4860 decicycles in qsort, 512 runs, 0 skips
4424 decicycles in qsort, 1024 runs, 0 skips
4191 decicycles in qsort, 2046 runs, 2 skips
4934 decicycles in qsort, 4094 runs, 2 skips
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
ff_huff_build_tree uses qsort underneath. AV_QSORT is substantially
faster due to the inlining of the comparison callback. Furthermore, this
code is reasonably performance critical, since in e.g the fraps codec,
ff_huff_build_tree is called on every frame. This routine is also called
in vp6 on every frame in some circumstances.
Sample benchmark (x86-64, Haswell, GNU/Linux), vp6 from FATE:
vp6 (old):
78930 decicycles in qsort, 1 runs, 0 skips
45330 decicycles in qsort, 2 runs, 0 skips
27825 decicycles in qsort, 4 runs, 0 skips
17471 decicycles in qsort, 8 runs, 0 skips
12296 decicycles in qsort, 16 runs, 0 skips
9554 decicycles in qsort, 32 runs, 0 skips
8404 decicycles in qsort, 64 runs, 0 skips
7405 decicycles in qsort, 128 runs, 0 skips
6740 decicycles in qsort, 256 runs, 0 skips
7540 decicycles in qsort, 512 runs, 0 skips
9498 decicycles in qsort, 1024 runs, 0 skips
9938 decicycles in qsort, 2048 runs, 0 skips
8043 decicycles in qsort, 4095 runs, 1 skips
vp6 (new):
15880 decicycles in qsort, 1 runs, 0 skips
10730 decicycles in qsort, 2 runs, 0 skips
10155 decicycles in qsort, 4 runs, 0 skips
7805 decicycles in qsort, 8 runs, 0 skips
6883 decicycles in qsort, 16 runs, 0 skips
6305 decicycles in qsort, 32 runs, 0 skips
5854 decicycles in qsort, 64 runs, 0 skips
5152 decicycles in qsort, 128 runs, 0 skips
4452 decicycles in qsort, 256 runs, 0 skips
4161 decicycles in qsort, 511 runs, 1 skips
4081 decicycles in qsort, 1023 runs, 1 skips
4072 decicycles in qsort, 2047 runs, 1 skips
4004 decicycles in qsort, 4095 runs, 1 skips
avutil/tree: add additional const qualifier to the comparator
libc's qsort comparator has a const qualifier on both arguments. This
adds a missing const qualifier to exactly match the comparator API.
Existing usages of av_tree_find, av_tree_insert are appropriately
modified: type signature changes of the comparators, and removal of
unnecessary void * casts of function pointers.
avfilter/vf_deshake: use a void * comparator for consistency
For generality, qsort uses a comparator whose elements are void *. This
makes the comparator have such a form, and thus makes the void * cast of
the comparator pointer useless. Furthermore, this makes the code more
consistent with other usages of qsort across the codebase.
This fixes extra semicolons that clang 3.7 on GNU/Linux warns about.
These were trigggered when built under -Wpedantic, which essentially
checks for strict ISO compliance in numerous ways.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
ISO C requires at least one argument in the place of the ellipsis in a
variadic macro. In particular, under -pedantic, this triggers the
warning -Wgnu-zero-variadic-macro-arguments on clang 3.7.
Reviewed-by: Nicolas George <george@nsup.org> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Marton Balint [Fri, 23 Oct 2015 18:32:51 +0000 (20:32 +0200)]
ffplay: use a separate struct for the rescaled YUVA AVSubtitle rectangles
Current code segfaults since the deprecation of AVSubtitleRect.pict because it
freed/realloced AVSubtitleRect.pict.data by itself.
The new code stores the generated YUVA AVSubtitle rectangles in their own
struct and keeps the original AVSubtitle structure untouched, because
overwriting it is considered invalid API usage.
GCC (and Clang) have this useful warning that is not enabled by -Wall or
-Wextra. This will ensure that issues like those fixed in 4da52e3630343e8d3a79aef2cafcb6bf0b71e8da
will trigger warnings.
Reviewed-by: Hendrik Leppkes <h.leppkes@gmail.com> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Luca Barbato [Thu, 22 Oct 2015 03:07:05 +0000 (05:07 +0200)]
mpjpeg: Cope with multipart lacking the initial CRLF
Some server in the wild do not put the boundary at a newline
as rfc1347 7.2.1 states.
Cope with that by reading a line and if it is not empty reading
a second one.
mp3 packets all have the same duration and number of samples
if their duration indicated in the container varies then thats an
indication that they are not 1 mp3 packet each.
If this autodetection fails for some case then please contact us
and provide a testcase.
Martin Storsjö [Tue, 20 Oct 2015 19:30:03 +0000 (22:30 +0300)]
movenc: Honor flush requests with delay_moov, when some tracks lack samples
This also makes sure that a fragmented file without the empty_moov
flag (i.e. with a non-empty initial moov fragment) actually gets
written, if some of the tracks turn out to not have any samples.
Martin Storsjö [Wed, 21 Oct 2015 08:56:36 +0000 (11:56 +0300)]
rtsp: Allow $ as interleaved packet indicator before a complete response header
Some RTSP servers ("HiIpcam/V100R003 VodServer/1.0.0") respond to
our keepalive GET_PARAMETER request by a truncated RTSP header
(lacking the final empty line to indicate a complete response
header). Prior to 764ec70149, this worked just fine since we
reacted to the $ as interleaved packet indicator anywhere.
Since $ is a valid character within the response header lines, 764ec70149 changed it to be ignored there. But to keep
compatibility with such broken servers, we need to at least
allow reacting to it at the start of lines.
avfilter,swresample,swscale: use fabs, fabsf instead of FFABS
It is well known that fabs and fabsf are at least as fast and sometimes
faster than the FFABS macro, at least on the gcc+glibc combination.
For instance, see the reference:
http://patchwork.sourceware.org/patch/6735/.
This was a patch to glibc in order to remove their usages of a macro.
The reason essentially boils down to fabs using the __builtin_fabs of
the compiler, while FFABS needs to infer to not use a branch and to
simply change the sign bit. Usually the inference works, but sometimes
it does not. This may be easily checked by looking at the asm.
This also has the added benefit of reducing macro usage, which has
problems with side-effects.
Note that avcodec is not handled here, as it is huge and
most things there are integer arithmetic anyway.