golomb: always check for invalid UE golomb codes in get_ue_golomb
Also correct the check to reject log < 7, because UPDATE_CACHE only
guarantees 25 meaningful bits.
This fixes undefined behavior:
runtime error: shift exponent is negative
Testing with START/STOP timers in get_ue_golomb, one for the first
branch (A) and one for the second (B), shows that there is practically no
slowdown, e.g. for the cavs decoder:
With the check in the B branch:
629 decicycles in get_ue_golomb B, 4194260 runs, 44 skips
433 decicycles in get_ue_golomb A,268434102 runs, 1354 skips
Without the check:
624 decicycles in get_ue_golomb B, 4194273 runs, 31 skips
433 decicycles in get_ue_golomb A,268434203 runs, 1253 skips
Since the B branch is executed far less often than the A branch, this
change is negligible, even more so for the h264 decoder, where the ratio
B/A is a lot smaller.
Fixes: mozilla bug 1230239 Fixes: fbeb8b2c7c996e9b91c6b1af319d7ebc/asan_heap-oob_195450f_2743_e8856ece4579ea486670be2b236099a0.bit Found-by: Tyson Smith Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Having a configure option with the same name as a MIPS ISA is confusing,
so better to remove it. This option was being used to add some
optimizations to a specific core (i6400). We will add the optimizations
just when the i6400 core has been detected, in a later patch.
Signed-off-by: Vicente Olivert Riera <Vincent.Riera@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Jean Delvare [Thu, 10 Dec 2015 14:25:57 +0000 (15:25 +0100)]
avfilter/vf_delogo: Use AVPixFmtDescriptor.nb_components
Relying on AVPixFmtDescriptor.nb_components is cleaner and faster than
checking data and linesize for every possible plane.
Signed-off-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
zjh8890 [Sat, 21 Nov 2015 16:07:35 +0000 (00:07 +0800)]
avcodec/aarch64/neon.S: Update neon.s for transpose_4x4H
The transpose_4x4H is wrong which cost me much time to find this bug. The orders of r2 and r3 are wrong,
this bug waste me much time while I make aarch64 arm instruction which used the function.
Tom Marecek [Thu, 10 Dec 2015 18:54:09 +0000 (18:54 +0000)]
ffmpeg: change command line option -dump to work without -loglevel debug
-hex and -dump command line options do nothing unless -loglevel debug is set.
-dump by itself is useful for monitoring live streams (to get the current PTS for example) however when it is used with -loglevel debug for an RTMP stream, librtmp also dumps the packet data which makes the output too noisy.
do_pkt_dump is only set in check_keyboard_interaction or by the -dump command line option so this change should have no effect on any other parts of the code..
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
If the input contains too many too large values, the imdct can overflow.
Even if it didn't, the output would be larger than the valid range of 29
bits.
Note that this is a very delicate limit: Allowing values up to 1<<25
does not prevent input larger than 1<<29 from arriving at
sbr_sum_square, while limiting values to 1<<23 breaks the
fate-aac-fixed-al_sbr_hq_cm_48_5.1 test.
Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
This commit addresses the issue of possible resource leaks when some intermediate
call fails. Unfortunately, even leaving aside this subtle intermediate
failure aspect, commit 8087632027d755cd32ccc9e91ea025e276197055 was only
partially successful in addressing memleaks. Hopefully, this commit
fixes the issue completely.
Tested with valgrind --leak-check=full --show-leak-kinds=all, and manual simulation
of malloc/realloc failures.
avcodec/aacsbr_tablegen: always initialize tables at runtime
This gets rid of virtually useless hardcoded tables hackery. The reason
it is useless is that a 320 element lut is anyway placed regardless of
--enable-hardcoded-tables, from which all necessary tables are trivially
derived at runtime at very low cost:
sample benchmark (x86-64, Haswell, GNU/Linux, single run is really
what is relevant here since looping drastically changes the bench). Fluctuations
are on the order of 10% for the single run test:
39400 decicycles in aacsbr_tableinit, 1 runs, 0 skips
25325 decicycles in aacsbr_tableinit, 2 runs, 0 skips
18475 decicycles in aacsbr_tableinit, 4 runs, 0 skips
15008 decicycles in aacsbr_tableinit, 8 runs, 0 skips
13016 decicycles in aacsbr_tableinit, 16 runs, 0 skips
12005 decicycles in aacsbr_tableinit, 32 runs, 0 skips
11546 decicycles in aacsbr_tableinit, 64 runs, 0 skips
11506 decicycles in aacsbr_tableinit, 128 runs, 0 skips
11500 decicycles in aacsbr_tableinit, 256 runs, 0 skips
11183 decicycles in aacsbr_tableinit, 509 runs, 3 skips
Tested with FATE with/without --enable-hardcoded-tables.
Jean Delvare [Wed, 9 Dec 2015 11:07:47 +0000 (12:07 +0100)]
avfilter/vf_delogo: round to the closest value
When the interpolated value is divided by the sum of weights, no
rounding is done, which means the value is truncated. This results in
a slight bias towards dark green in the interpolated area. Rounding
properly removes the bias.
I measured this change to reduce the interpolation error by 1 to 2 %
on average on a number of sample input and logo area combinations.
Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
avcodec/jpeg2000: replace naive pow call with smarter exp2fi
pow is a very wasteful function for this purpose. A low hanging fruit
would be simply to replace with exp2f, and that does yield some speedup.
However, there are 2 drawbacks of this:
1. It does not exploit the integer nature of the argument.
2. (minor) Some platforms lack a proper exp2f routine, making benefits available
only to non broken libm.
3. exp2f does not solve the same issue that plagues pow, namely terrible
worst case performance. This is a fundamental issue known as the
"table-maker's dilemma" recognized by Prof. Kahan himself and
subsequently elaborated and researched by many others. All this is clear from benchmarks below.
This exploits the IEEE-754 format to get very good performance even in
the worst case for integer powers of 2. This solves all the issues noted
above. Function tested with clang usan over [-1000, 1000] (beyond range of
relevance for this, which is [-255, 255]), patch itself with FATE.
Benchmarks obtained on x86-64, Haswell, GNU-Linux via 10^5 iterations of
the pow call, START/STOP, and command ffplay ~/samples/jpeg2000/chiens_dcinema2K.mxf.
Low number of runs also given to prove the point about worst case:
pow:
216270 decicycles in pow, 1 runs, 0 skips
110175 decicycles in pow, 2 runs, 0 skips
56085 decicycles in pow, 4 runs, 0 skips
29013 decicycles in pow, 8 runs, 0 skips
15472 decicycles in pow, 16 runs, 0 skips
8689 decicycles in pow, 32 runs, 0 skips
5295 decicycles in pow, 64 runs, 0 skips
3599 decicycles in pow, 128 runs, 0 skips
2748 decicycles in pow, 256 runs, 0 skips
2304 decicycles in pow, 511 runs, 1 skips
2072 decicycles in pow, 1022 runs, 2 skips
1963 decicycles in pow, 2044 runs, 4 skips
1894 decicycles in pow, 4091 runs, 5 skips
1860 decicycles in pow, 8184 runs, 8 skips
exp2f:
134140 decicycles in pow, 1 runs, 0 skips
68110 decicycles in pow, 2 runs, 0 skips
34530 decicycles in pow, 4 runs, 0 skips
17677 decicycles in pow, 8 runs, 0 skips
9175 decicycles in pow, 16 runs, 0 skips
4931 decicycles in pow, 32 runs, 0 skips
2808 decicycles in pow, 64 runs, 0 skips
1747 decicycles in pow, 128 runs, 0 skips
1208 decicycles in pow, 256 runs, 0 skips
952 decicycles in pow, 512 runs, 0 skips
822 decicycles in pow, 1024 runs, 0 skips
765 decicycles in pow, 2047 runs, 1 skips
722 decicycles in pow, 4094 runs, 2 skips
693 decicycles in pow, 8190 runs, 2 skips
exp2fi:
2740 decicycles in pow, 1 runs, 0 skips
1530 decicycles in pow, 2 runs, 0 skips
955 decicycles in pow, 4 runs, 0 skips
622 decicycles in pow, 8 runs, 0 skips
477 decicycles in pow, 16 runs, 0 skips
368 decicycles in pow, 32 runs, 0 skips
317 decicycles in pow, 64 runs, 0 skips
291 decicycles in pow, 128 runs, 0 skips
277 decicycles in pow, 256 runs, 0 skips
268 decicycles in pow, 512 runs, 0 skips
265 decicycles in pow, 1024 runs, 0 skips
263 decicycles in pow, 2048 runs, 0 skips
263 decicycles in pow, 4095 runs, 1 skips
260 decicycles in pow, 8191 runs, 1 skips
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>