avcodec/mips: [loongson] fix improper use of register constraints.
Constraint "g" means compiler can store variable in memory or register.
When we use constraint "g" for a variable and this variable was operated by
instruction which only support register operands may lead "invalid operands" error.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
avcodec/mips: [loongson] reoptimize put and add pixels clamped functions.
Simplify the usage of intermediate variable addr and remove unused variable all64
in following functions:
1. ff_put_pixels_clamped_mmi
2. ff_put_signed_pixels_clamped_mmi
3. ff_add_pixels_clamped_mmi
This optimization speed up mpeg4 decode about 2% on loongson platform(tested with 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
avcodec/mips: [loongson] simplify the usage of intermediate variable addr.
Simplify the usage of intermediate variable addr in following functions:
1. ff_put_pixels4_8_mmi
2. ff_put_pixels8_8_mmi
3. ff_put_pixels16_8_mmi
4. ff_avg_pixels16_8_mmi.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
avcodec: [loongson] fix bug of mss2-wmv failed in fate test.
Failed case: mss2-wmv
In following functions, pmullh was used to multiply two 16-bit data, this will cause data overflow.
1. ff_vc1_inv_trans_8x8_dc_mmi
2. ff_vc1_inv_trans_8x8_mmi
3. ff_vc1_inv_trans_8x4_mmi
4. ff_vc1_inv_trans_4x8_mmi
5. ff_vc1_inv_trans_4x4_mmi
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
avcodec/ra144: Fix undefined integer overflow in add_wav()
Fixes: signed integer overflow: -26884 * 91439 cannot be represented in type 'int' Fixes: 9687/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_RA_144_fuzzer-4995588121690112 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Reoptimize function ff_put_h264_chroma_mc8_mmi and ff_avg_h264_chroma_mc8_mmi.
Performance of h264 decoding improved about 5%(from 69fps to 73fps, tested on loongson 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Shiyou Yin [Fri, 31 Aug 2018 13:41:49 +0000 (21:41 +0800)]
avcodec/mips: [loongson] reoptimize simple idct with mmi.
Performance of mpeg4 decoding improved about 23%(from 128fps to 158fps, tested on loongson 3A3000).
Reoptimized following functions with mmi.
1. ff_simple_idct_put_8_mmi
2. ff_simple_idct_add_8_mmi
3. ff_simple_idct_8_mmi
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Dale Curtis [Thu, 30 Aug 2018 22:18:25 +0000 (15:18 -0700)]
avformat/mov: Error on too large stsd entry counts.
Entries are always at least 8 bytes per the parsing code, so if we
see an impossible entry count avoid massive allocations. This is
similar to an existing check in mov_read_stsc().
Since ff_mov_read_stsd_entries() does eof checks, an alternative
approach could be to clamp the entry count to atom.size / 8.
Signed-off-by: Dale Curtis <dalecurtis@chromium.org> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
- Allow to add deps in any order rather than "in linking order".
- Expand deps chains as required rather than just once.
- Validate that there are no cycles.
- Validate that [after expansion] deps are limited to other fflibs.
- Remove expectation for a specific output order of unique().
Previously when adding items to <fflib>_deps, developers were
required to add them in linking order. This can be awkward and
bug-prone, especially when a list is not empty, e.g. when adding
conditional deps.
It also implicitly expected unique() to keep the last instance of
recurring items such that these lists maintain their linking order
after removing duplicate items.
This patch mainly allows to add deps in any order by keeping just
one master list in linking order, and then reordering all the
<fflib>_deps lists to align with the master list order.
This master list is LIBRARY_LIST itself, where otherwise its order
doesn't matter.
The patch also removes a limit where these deps lists were expanded
only once. This could have resulted in incomplete expanded lists,
or forcing devs to add already-deducable deps to avoid this issue.
Note: it is possible to deduce the master list order automatically
from the deps lists, but in this case it's probably not worth the
added complexity, even if minor. Maintaining one list should be OK.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Carl Eugen Hoyos [Thu, 30 Aug 2018 22:43:17 +0000 (00:43 +0200)]
lavc/v4l2_m2m_enc: Add missing braces around initializers.
Fixes the following warnings:
libavcodec/v4l2_m2m_enc.c:51:12: warning: missing braces around initializer
libavcodec/v4l2_m2m_enc.c:71:12: warning: missing braces around initializer
Add gcc version check before add -fno-expensive-optimizations flag.
Only when gcc version is lower than 5.3.0, this flag is needed.
More bug info see:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67736
https://gcc.gnu.org/ml/gcc-patches/2012-05/msg00401.html
Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
check_deps() recursively enables/disables components, and its loop is
iterated nearly 6000 times. It's particularly slow in bash - currently
consuming more than 50% of configure runtime, and about 20% with other
shells.
This commit applies few local optimizations, most effective first:
- Use $1 $2 ... instead of pushvar/popvar, and same at enable_deep*
- Abort early in one notable case - empty deps, to avoid costly no-op.
- Smaller changes which do add up:
- Handle ${cfg}_checking locally instead of via enable[d]/disable
- ${cfg}_checking: test done before inprogress - x2 faster in 50%+
- one eval instead of several at the empty-deps early abort path.
- The "actual work" part is unmodified - just its surroundings.
Biggest speedups (relative and absolute) are observed with bash.
Tested-by: Michael Niedermayer <michael@niedermayer.cc> Tested-by: Helmut K. C. Tessarek <tessarek@evermeet.cx> Tested-by: Dave Yeo <daveryeo@telus.net> Tested-by: Reino Wijnsma <rwijnsma@xs4all.nl> Signed-off-by: James Almer <jamrial@gmail.com>
Inside print_enabled components, the filter_list case invokes sed
about 350 times to parse the same source file and extract different
info for each arg. This is never instant, and on systems where fork is
slow (notably MSYS2/Cygwin on windows) it takes many seconds.
Change it to use sed once on the source file and set env vars with the
parse results, then use these results inside the loop.
Additionally, the cases of indev_list and outdev_list are very
infrequent, but nevertheless they're faster, and arguably cleaner, with
shell parameter substitutions than with command substitutions.
Tested-by: Michael Niedermayer <michael@niedermayer.cc> Tested-by: Helmut K. C. Tessarek <tessarek@evermeet.cx> Tested-by: Dave Yeo <daveryeo@telus.net> Tested-by: Reino Wijnsma <rwijnsma@xs4all.nl> Signed-off-by: James Almer <jamrial@gmail.com>
Currently configure spends 50-70% of its runtime inside a single
function: flatten_extralibs[_wrapper] - which does string processing.
During its run, nearly 20K command substitutions (subshells) are used,
including its callees unique() and resolve(), which is the reason
for its lengthy run.
This commit avoids all subshells during its execution, speeding it up
by about two orders of magnitude, and reducing the overall configure
runtime by 50-70% .
resolve() is rewritten to avoid subshells, and in unique() and
flatten_extralibs() we "inline" the filter[_out] functionality.
Note that logically, "unique" functionality has more than one possible
output (depending on which of the recurring items is kept). As it
turns out, other parts expect the last recurring item to be kept
(which was the original behavior of uniqie()). This patch preservs
its output order.
Tested-by: Michael Niedermayer <michael@niedermayer.cc> Tested-by: Helmut K. C. Tessarek <tessarek@evermeet.cx> Tested-by: Dave Yeo <daveryeo@telus.net> Tested-by: Reino Wijnsma <rwijnsma@xs4all.nl> Signed-off-by: James Almer <jamrial@gmail.com>
Zhong Li [Thu, 28 Jun 2018 09:01:46 +0000 (17:01 +0800)]
lavc/encode: fix frame_number double-counted
Encoder frame_number may be double-counted if some frames are cached and then flushed.
Take qsv encoder (some frames are cached firsty for asynchronism) as example,
./ffmpeg -loglevel verbose -hwaccel qsv -c:v h264_qsv -i in.mp4 -vframes 100 -c:v h264_qsv out.mp4
frame_number passed to encoder is double-counted and larger than the accurate value.
Libx264 encoding with B frames can also reproduce it.