This commit addresses the issue of possible resource leaks when some intermediate
call fails. Unfortunately, even leaving aside this subtle intermediate
failure aspect, commit 8087632027d755cd32ccc9e91ea025e276197055 was only
partially successful in addressing memleaks. Hopefully, this commit
fixes the issue completely.
Tested with valgrind --leak-check=full --show-leak-kinds=all, and manual simulation
of malloc/realloc failures.
avcodec/aacsbr_tablegen: always initialize tables at runtime
This gets rid of virtually useless hardcoded tables hackery. The reason
it is useless is that a 320 element lut is anyway placed regardless of
--enable-hardcoded-tables, from which all necessary tables are trivially
derived at runtime at very low cost:
sample benchmark (x86-64, Haswell, GNU/Linux, single run is really
what is relevant here since looping drastically changes the bench). Fluctuations
are on the order of 10% for the single run test:
39400 decicycles in aacsbr_tableinit, 1 runs, 0 skips
25325 decicycles in aacsbr_tableinit, 2 runs, 0 skips
18475 decicycles in aacsbr_tableinit, 4 runs, 0 skips
15008 decicycles in aacsbr_tableinit, 8 runs, 0 skips
13016 decicycles in aacsbr_tableinit, 16 runs, 0 skips
12005 decicycles in aacsbr_tableinit, 32 runs, 0 skips
11546 decicycles in aacsbr_tableinit, 64 runs, 0 skips
11506 decicycles in aacsbr_tableinit, 128 runs, 0 skips
11500 decicycles in aacsbr_tableinit, 256 runs, 0 skips
11183 decicycles in aacsbr_tableinit, 509 runs, 3 skips
Tested with FATE with/without --enable-hardcoded-tables.
Jean Delvare [Wed, 9 Dec 2015 11:07:47 +0000 (12:07 +0100)]
avfilter/vf_delogo: round to the closest value
When the interpolated value is divided by the sum of weights, no
rounding is done, which means the value is truncated. This results in
a slight bias towards dark green in the interpolated area. Rounding
properly removes the bias.
I measured this change to reduce the interpolation error by 1 to 2 %
on average on a number of sample input and logo area combinations.
Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
avcodec/jpeg2000: replace naive pow call with smarter exp2fi
pow is a very wasteful function for this purpose. A low hanging fruit
would be simply to replace with exp2f, and that does yield some speedup.
However, there are 2 drawbacks of this:
1. It does not exploit the integer nature of the argument.
2. (minor) Some platforms lack a proper exp2f routine, making benefits available
only to non broken libm.
3. exp2f does not solve the same issue that plagues pow, namely terrible
worst case performance. This is a fundamental issue known as the
"table-maker's dilemma" recognized by Prof. Kahan himself and
subsequently elaborated and researched by many others. All this is clear from benchmarks below.
This exploits the IEEE-754 format to get very good performance even in
the worst case for integer powers of 2. This solves all the issues noted
above. Function tested with clang usan over [-1000, 1000] (beyond range of
relevance for this, which is [-255, 255]), patch itself with FATE.
Benchmarks obtained on x86-64, Haswell, GNU-Linux via 10^5 iterations of
the pow call, START/STOP, and command ffplay ~/samples/jpeg2000/chiens_dcinema2K.mxf.
Low number of runs also given to prove the point about worst case:
pow:
216270 decicycles in pow, 1 runs, 0 skips
110175 decicycles in pow, 2 runs, 0 skips
56085 decicycles in pow, 4 runs, 0 skips
29013 decicycles in pow, 8 runs, 0 skips
15472 decicycles in pow, 16 runs, 0 skips
8689 decicycles in pow, 32 runs, 0 skips
5295 decicycles in pow, 64 runs, 0 skips
3599 decicycles in pow, 128 runs, 0 skips
2748 decicycles in pow, 256 runs, 0 skips
2304 decicycles in pow, 511 runs, 1 skips
2072 decicycles in pow, 1022 runs, 2 skips
1963 decicycles in pow, 2044 runs, 4 skips
1894 decicycles in pow, 4091 runs, 5 skips
1860 decicycles in pow, 8184 runs, 8 skips
exp2f:
134140 decicycles in pow, 1 runs, 0 skips
68110 decicycles in pow, 2 runs, 0 skips
34530 decicycles in pow, 4 runs, 0 skips
17677 decicycles in pow, 8 runs, 0 skips
9175 decicycles in pow, 16 runs, 0 skips
4931 decicycles in pow, 32 runs, 0 skips
2808 decicycles in pow, 64 runs, 0 skips
1747 decicycles in pow, 128 runs, 0 skips
1208 decicycles in pow, 256 runs, 0 skips
952 decicycles in pow, 512 runs, 0 skips
822 decicycles in pow, 1024 runs, 0 skips
765 decicycles in pow, 2047 runs, 1 skips
722 decicycles in pow, 4094 runs, 2 skips
693 decicycles in pow, 8190 runs, 2 skips
exp2fi:
2740 decicycles in pow, 1 runs, 0 skips
1530 decicycles in pow, 2 runs, 0 skips
955 decicycles in pow, 4 runs, 0 skips
622 decicycles in pow, 8 runs, 0 skips
477 decicycles in pow, 16 runs, 0 skips
368 decicycles in pow, 32 runs, 0 skips
317 decicycles in pow, 64 runs, 0 skips
291 decicycles in pow, 128 runs, 0 skips
277 decicycles in pow, 256 runs, 0 skips
268 decicycles in pow, 512 runs, 0 skips
265 decicycles in pow, 1024 runs, 0 skips
263 decicycles in pow, 2048 runs, 0 skips
263 decicycles in pow, 4095 runs, 1 skips
260 decicycles in pow, 8191 runs, 1 skips
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Sebastian Dröge [Tue, 8 Dec 2015 08:24:09 +0000 (10:24 +0200)]
avcodec/h264: Set CORRUPT flag on output frames that are not fully recovered
In the merge commit 78265fcfeee153e5e26ad4dbc7831a84ade447d6 this behaviour
was broken and the CORRUPT flag would never ever be set on a frame. However
the flag on the AVCodecContext was taken into account properly, including
AV_CODEC_FLAG2_SHOW_ALL.
The reason for this was that the recovered field of the next output picture
was always set to TRUE whenever one of the two AVCodecContext flags was set,
which made it impossible to detect later, before outputting, if the frame was
really recovered or not. Now don't set it to TRUE unless the frame is really
recovered and check the AVCodecContext flags right before outputting.
Signed-off-by: Sebastian Dröge <sebastian@centricular.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Clément Bœsch [Tue, 1 Dec 2015 14:54:31 +0000 (15:54 +0100)]
avutil/threadmessage: split the pthread condition in two
Fix a dead lock under certain conditions. Let's assume we have a queue of 1
message max, 2 senders, and 1 receiver.
Scenario (real record obtained with debug added):
[...]
SENDER #0: acquired lock
SENDER #0: queue is full, wait
SENDER #1: acquired lock
SENDER #1: queue is full, wait
RECEIVER: acquired lock
RECEIVER: reading a msg from the queue
RECEIVER: signal the cond
RECEIVER: acquired lock
RECEIVER: queue is empty, wait
SENDER #0: writing a msg the queue
SENDER #0: signal the cond
SENDER #0: acquired lock
SENDER #0: queue is full, wait
SENDER #1: queue is full, wait
Translated:
- initially the queue contains 1/1 message with 2 senders blocking on
it, waiting to push another message.
- Meanwhile the receiver is obtaining the lock, read the message,
signal & release the lock. For some reason it is able to acquire the
lock again before the signal wakes up one of the sender. Since it
just emptied the queue, the reader waits for the queue to fill up
again.
- The signal finally reaches one of the sender, which writes a message
and then signal the condition. Unfortunately, instead of waking up
the reader, it actually wakes up the other worker (signal = notify
the condition just for 1 waiter), who can't push another message in
the queue because it's full.
- Meanwhile, the receiver is still waiting. Deadlock.
This scenario can be triggered with for example:
tests/api/api-threadmessage-test 1 2 100 100 1 1000 1000
One working solution is to make av_thread_message_queue_{send,recv}()
call pthread_cond_broadcast() instead of pthread_cond_signal() so both
senders and receivers are unlocked when work is done (be it reading or
writing).
This second solution replaces the condition with two: one to notify the
senders, and one to notify the receivers. This prevents senders from
notifying other senders instead of a reader, and the other way around.
It also avoid broadcasting to everyone like the first solution, and is,
as a result in theory more optimized.
mjpegdec: consider chroma subsampling in size check
If the chroma components are subsampled, smaller buffers are allocated
for them. In that case the maximal block_offset for the chroma
components is not as large as for the luma component.
This fixes out of bounds writes causing segmentation faults or memory
corruption.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
1. Start using TNS coefficient compression.
2. Start using 3 bits per coefficient maximum for short windows.
The bits we save from these 2 changes seem to make a nice impact on the
rest of the file/windows.
3. Remove special case gain checking for short windows.
4. Modify the coefficient loop to support up to 3 windows.
The additional restrictions on TNS were something that was no in the
specifications and furthermore restricting TNS to only low energy short
windows was done to compensate for bugs elsewhere in the code.
Overall, the improvements here reduce crackling artifacts heard in very
noisy tracks.
aacenc: move the TNS search and filtering before PNS
The original plan was to have TNS use data from the PNS search to better
tune itself to noise but this was never used nor necessary. This should
slightly boost the PNS accuracy if TNS was used.
Pretty standard macros, these should help libav*
users avoid repeating ver.si.on parsing code,
which aids in compatibility-checking tasks like
identifying FFmpeg from Libav (_MICRO >= 100 check).
Something many are doing since we are not
intercompatible anymore.
Signed-off-by: Reynaldo H. Verdejo Pinochet <reynaldo@osg.samsung.com>
avcodec/hevc: Fix integer overflow of entry_point_offset
Fixes out of array read Fixes: d41d8cd98f00b204e9800998ecf8427e/signal_sigsegv_321165b_7641_077dfcd8cbc80b1c0b470c8554cd6ffb.bit Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
avcodec/dirac_parser: Check that there is a previous PU before accessing it
Fixes out of array read Fixes: 99d142c47e6ba3510a74b872a1a2ae72/asan_heap-oob_11b36f4_3811_0f5c69e7609a88a580135678de1df844.dxa Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Thiss commit removes the experimental flag from the native AAC Encoder
and thus makes it the default.
After a lot of work, done by myself and Claudio Freire, the quality of
this encoder rivals and surpasses libfdk_aac in some situations. The
encoder had instability issues earlier which prevented it from having
its experimental flag removed, however the last commits done by Claudio
removed the last known source of instability and solved a lot of
problems which were previously observed. The issues were caused by the
various coding tools interfering with the scalefactor indices. Thus,
with these problems solved, it should now be possible to declare this
encoder as the default and recommend that the users should use it
instead of others provided by external libraries, as it is both faster
and has a subjectively higher quality with selected tracks.
The encoder has still yet to be fine tuned for every possible audio file
type like music or voice, so it is hoped that with the experimental flag
removed the users should be able to provide feedback and make the
encoder better than the alternatives for every type of audio and at
every bitrate.
doc/encoders.texi: update documentation for the native AAC encoder
Since the next commit removes the experimental flag from the encoder
it's better to update the documentation which has been around in its
current form for as long as the encoder itself.
This coder produces a much lower quality audio than the rest, is much
slower and is unstable. Hasn't been updated for a very long time as
well, hence it is more appropriate to remove it since it also depends on
a big burden of a code (the encode_window_bands_info function which is
just as old, just as unstable and bad and in no way modifiable or
fixable).