git.sesse.net Git - ffmpeg/log

x86/tx_float: Fixes compilation with old yasm

Use three operand format on some instructions, and lea to load effective
addresses of tables.

Signed-off-by: James Almer <jamrial@gmail.com>

lavu/x86/tx_float: fix FMA3 implying AVX2 is available

It's the other way around - AVX2 implies FMA3 is available.

fate: add tests for JPEG-LS PAL8

Signed-off-by: James Almer <jamrial@gmail.com>

lavu/x86: add FFT assembly

This commit adds a pure x86 assembly SIMD version of the FFT in libavutil/tx.
The design of this pure assembly FFT is pretty unconventional.

On the lowest level, instead of splitting the complex numbers into
real and imaginary parts, we keep complex numbers together but split
them in terms of parity. This saves a number of shuffles in each transform,
but more importantly, it splits each transform into two independent
paths, which we process using separate registers in parallel.
This allows us to keep all units saturated and lets us use all available
registers to avoid dependencies.
Moreover, it allows us to double the granularity of our per-load permutation,
skipping many expensive lookups and allowing us to use just 4 loads per register,
rather than 8, or in case FMA3 (and by extension, AVX2), use the vgatherdpd
instruction, which is at least as fast as 4 separate loads on old hardware,
and quite a bit faster on modern CPUs).

Higher up, we go for a bottom-up construction of large transforms, foregoing
the traditional per-transform call-return recursion chains. Instead, we always
start at the bottom-most basis transform (in this case, a 32-point transform),
and continue constructing larger and larger transforms until we return to the
top-most transform.
This way, we only touch the stack 3 times per a complete target transform:
once for the 1/2 length transform and two times for the 1/4 length transform.

The combination algorithm we use is a standard Split-Radix algorithm,
as used in our C code. Although a version with less operations exists
(Steven G. Johnson and Matteo Frigo's "A modified split-radix FFT with fewer
arithmetic operations", IEEE Trans. Signal Process. 55 (1), 111–119 (2007),
which is the one FFTW uses), it only has 2% less operations and requires at least 4x
the binary code (due to it needing 4 different paths to do a single transform).
That version also has other issues which prevent it from being implemented
with SIMD code as efficiently, which makes it lose the marginal gains it offered,
and cannot be performed bottom-up, requiring many recursive call-return chains,
whose overhead adds up.

We go through a lot of effort to minimize load/stores by keeping as much in
registers in between construcring transforms. This saves us around 32 cycles,
on paper, but in reality a lot more due to load/store aliasing (a load from a
memory location cannot be issued while there's a store pending, and there are
only so many (2 for Zen 3) load/store units in a CPU).
Also, we interleave coefficients during the last stage to save on a store+load
per register.

Each of the smallest, basis transforms (4, 8 and 16-point in our case)
has been extremely optimized. Our 8-point transform is barely 20 instructions
in total, beating our old implementation 8-point transform by 1 instruction.
Our 2x8-point transform is 23 instructions, beating our old implementation by
6 instruction and needing 50% less cycles. Our 16-point transform's combination
code takes slightly more instructions than our old implementation, but makes up
for it by requiring a lot less arithmetic operations.

Overall, the transform was optimized for the timings of Zen 3, which at the
time of writing has the most IPC from all documented CPUs. Shuffles were
preferred over arithmetic operations due to their 1/0.5 latency/throughput.

On average, this code is 30% faster than our old libavcodec implementation.
It's able to trade blows with the previously-untouchable FFTW on small transforms,
and due to its tiny size and better prediction, outdoes FFTW on larger transforms
by 11% on the largest currently supported size.

doc/transforms: add documentation for the FFT transforms

Makes the code far easier to follow, and makes creating new SIMD
for the transforms far, far easier.

checkasm: add av_tx FFT SIMD testing code

This sadly required making changes to the code itself,
due to the same context needing to be reused for both versions.
The lookup table had to be duplicated for both versions.

lavu/tx: add parity revtab generator version

This will be used for SIMD support.

lavu: bump minor and add APIchanges entry for the lavu/tx changes

lavu/tx: add full-sized iMDCT transform flag

lavu/tx: add unaligned flag to the API

lavu/tx: add a 9-point FFT and (i)MDCT

lavu/tx: add a 7-point FFT and (i)MDCT

lavu/tx: refactor power-of-two FFT

This commit refactors the power-of-two FFT, making it faster and
halving the size of all tables, making the code much smaller on
all systems.
This removes the big/small pass split, because on modern systems
the "big" pass is always faster, and even on older machines there
is no measurable speed difference.

lavu/tx: minor code style improvements and additional comments

avcodec/exr: Return correct error code on allocation failure

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avcodec/avcodec: Actually honour the documentation of subtitle_header

It is only supposed to be freed by libavcodec for decoders, yet
avcodec_open2() always frees it on failure.
Furthermore, avcodec_close() doesn't free it for decoders.
Both of this has been changed.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

tests/fate: add a test for APNG_DISPOSE_OP_BACKGROUND

avformat/asfdec_o: Use ff_get_extradata()

Fixes: OOM
Fixes: 27240/clusterfuzz-testcase-minimized-ffmpeg_dem_ASF_O_fuzzer-5937469859823616
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

avcodec/av1_metadata: don't store the inserted TD OBU in stack

Fixes: stack-use-after-return
Fixes: clusterfuzz-testcase-minimized-ffmpeg_BSF_AV1_METADATA_fuzzer-5931515701755904
Fixes: clusterfuzz-testcase-minimized-ffmpeg_BSF_AV1_METADATA_fuzzer-6105676541722624
Reviewed-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: James Almer <jamrial@gmail.com>

avformat/id3v2: Check end for overflow in id3v2_parse()

Fixes: signed integer overflow: 9223372036840103978 + 67637280 cannot be represented in type 'long'
Fixes: 33341/clusterfuzz-testcase-minimized-ffmpeg_dem_DSF_fuzzer-6408154041679872
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

avformat/mxfdec: Fix file position addition

Fixes: signed integer overflow: 9223372036854775805 + 4 cannot be represented in type 'long'
Fixes: 29927/clusterfuzz-testcase-minimized-ffmpeg_dem_MXF_fuzzer-5579985228267520
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

avformat/wtvdec: Improve size overflow checks in parse_chunks()

Fixes: signed integer overflow: 32 + 2147483647 cannot be represented in type 'int
Fixes: 32967/clusterfuzz-testcase-minimized-ffmpeg_dem_WTV_fuzzer-5132856218222592
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Peter Ross <pross@xvid.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

avcodec/faxcompr: Check remaining bits on error in decode_group3_1d_line()

Fixes: Timeout
Fixes: 32886/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_TIFF_fuzzer-4779761466474496
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

tools/target_dec_fuzzer: Adjust threshold for paf video

Fixes: Timeout (long -> 2sec)
Fixes: 32790/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_PAF_VIDEO_fuzzer-5497584169910272
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

avformat/mov: check for pts overflow in mov_read_sidx()

Fixes: signed integer overflow: 9223372036846336888 + 4278255871 cannot be represented in type 'long'
Fixes: 32782/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-6059216516284416
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

avcodec/jpeglsdec: Don't presume the context to contain a JLSState

Before 9b3c46a081a9f01559082bf7a154fc6be1e06c18 every call to
ff_jpegls_decode_picture() allocated and freed a JLSState. This commit
instead put said structure into the context of the JPEG-LS decoder to
avoid said allocation. But said function can also be called from other
MJPEG-based decoders and their contexts doesn't contain said structure,
leading to segfaults. This commit fixes this: The JLSState is now
allocated on the first call to ff_jpegls_decode_picture() and stored in
the context.

Found-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avcodec/utils: Check ima wav duration for overflow

Fixes: signed integer overflow: 44331634 * 65 cannot be represented in type 'int'
Fixes: 32120/clusterfuzz-testcase-minimized-ffmpeg_dem_RSD_fuzzer-5760221223583744
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

tools/target_dec_fuzzer: adjust threshold for arbc

Fixes: Timeout (63sec -> 48ms)
Fixes: 31886/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_ARBC_fuzzer-5287235705503744
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

tools/target_dec_fuzzer: Adjust threshold for TSCC

Fixes: Timeout
Fixes: 31850/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_TSCC_fuzzer-5940231289307136
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

avcodec/rv10: Execute whole size check earlier for rv20

Fixes: Timeout
Fixes: 31380/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_RV20_fuzzer-5230899257016320
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

avformat/cafdec: Check channels

Fixes: signed integer overflow: -1184429040541376544 * 32 cannot be represented in type 'long'
Fixes: 31788/clusterfuzz-testcase-minimized-ffmpeg_dem_CAF_fuzzer-6236746338664448
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

doc/filters: note requirements for xfade inputs

avutil/cpu: Schedule deprecated functions for removal

av_set_cpu_flags_mask() has been deprecated in the commit which merged
it: 6df42f98746be06c883ce683563e07c9a2af983f; av_parse_cpu_flags() has
been deprecated in 4b529edff8934c258af95e5acc51f84deea66262.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

Include attributes.h directly

Some files currently rely on libavutil/cpu.h to include it for them;
yet said file won't use include it any more after the currently
deprecated functions are removed, so include attributes.h directly.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

lavc/aarch64: add pred16x16 10-bit functions

Benchmarks:                      A53     A72
pred16x16_dc_10_c:              136.0   124.0
pred16x16_dc_10_neon:           121.2   106.0
pred16x16_horizontal_10_c:      155.0    73.2
pred16x16_horizontal_10_neon:    82.2    67.7
pred16x16_top_dc_10_c:          106.0    93.7
pred16x16_top_dc_10_neon:        87.7    77.2
pred16x16_vertical_10_c:         83.0    67.7
pred16x16_vertical_10_neon:      54.2    61.7

Some functions work slower than C and are left commented out.

lavc/aarch64: change h264pred_init structure

Change structure to allow the addition of other bit depths.

doc/muxers.texi: fix build issue for unknown command

The build log:
** Unknown command `@code' (left as is) (in src/doc/muxers.texi l. 2020)
*** '{' without macro. Before: -map} option with the ffmpeg CLI tool. (in src/doc/muxers.texi l. 2020)
*** '}' without opening '{' before: option with the ffmpeg CLI tool. (in src/doc/muxers.texi l. 2020)

avutil/cpu: Use HW_NCPUONLINE to detect # of online CPUs with OpenBSD

Signed-off-by: Brad Smith <brad@comstyle.com>
Signed-off-by: Marton Balint <cus@passwd.hu>

avfilter/af_mcompand: check allocation results

Fixes the only remaining part of ticket #8931.

Signed-off-by: Marton Balint <cus@passwd.hu>

avformat/hls: check return value of new_init_section()

Fixes part of ticket #8931.

Signed-off-by: Marton Balint <cus@passwd.hu>

avcodec/nvenc: fix lossless tuning logic

Relying on the order of the enum is bad.
It clashes with the new presets having to sit at the end of the list, so
that they can be properly filtered out by the options parser on builds
with older SDKs.

So this refactors nvenc.c to instead rely on the internal NVENC_LOSSLESS
flag. For this, the preset mapping has to happen much earlier, so it's
moved from nvenc_setup_encoder to nvenc_setup_device and thus runs
before the device capability check.

lavc/libaomdec: fix build with 1.0.0

aom_codec_frame_flags_t in libaom 1.0.0 is defined in aom_encoder.h,
whereas for newer versions it was moved to aom_codec.h

lavu/detection_bbox.h: use AV_NUM_DETECTION_BBOX_CLASSIFY to replace AV_NUM_BBOX_CLASSIFY

avformat/utils: Combine identical statements

This would only make a difference in case the first attempt to
initialize the encoder failed and the second succeeded. The only
reason I can think of for this to happen is that the options (in
particular the codec whitelist) are not used for the second try
and that obviously implies that we should not even try a second time
to open the decoder.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avcodec/jpeglsdec: Don't allocate+free JPEGLSState for every frame

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avformat/matroskaenc: Remove unnecessary function calls

ffio_fill() is used when initially writing unknown length elements;
yet it can happen that the amount of bytes written by it is zero in
which case it is of course unnecessary to ever call it. Whether it is
possible to know this during compiletime depends upon how aggressively
the compiler inlines function calls (i.e. if it inlines calls to
start_ebml_master() where the upper bound for the size of the element
implies that the size will be written on one byte) and this depends upon
optimization settings. It is not the aim of this patch to inline all
calls where it is known that ffio_fill() will be unnecessary, but merely
to make compilers that inline such calls aware of the fact that writing
zero bytes with ffio_fill() is unnecessary. To this end
av_builtin_constant_p() is used to check whether the size is a
compiletime constant.

For GCC 10 this made a difference at -O3 only: The size of .text
decreased from 0x747F (with 29 calls to ffio_fill(), eight of which
use size zero) to 0x7337 (with 21 calls to ffio_fill(), zero of which
use size zero).
For Clang 11 it made a difference at -O2 and -O3: At -O2, the size of
.text decreased from 0x879C to 0x871C (with eight calls to ffio_fill()
eliminated); at -O3 the size of .text decreased from 0xAF2F to 0xAEBF.
Once again, eight calls to ffio_fill() with size zero have been
eliminated.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avformat/wavdec: Fix reading files with id3v2 apic before fmt tag

Up until now the cover images will get the stream index 0 in this case,
violating the hardcoded assumption that this is the index of the audio
stream. Fix this by creating the audio stream first; this is also in
line with the expectations of ff_pcm_read_seek() and
ff_spdif_read_packet(). It also simplifies the code to parse the fmt and
xma2 tags.

Fixes #8540; regression since f5aad350d3695b5b16e7d135154a4c61e4dce9d8.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avformat/segment: Use ff_stream_encode_params_copy()

It is simpler and more complete (e.g. it copies the id).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avformat/webm_chunk: Use ff_stream_encode_params_copy()

It is simpler and more complete (e.g. it copies the framerate
information which allows to write the default duration element).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avformat/id3v2: Don't reverse the order of id3v2 APICs

When parsing ID3v2 tags, special (non-text) metadata is not applied
directly and unconditionally; instead it is stored in a linked list
in which elements are prepended. When traversing the list to add APICs
(or private tags) at the end, the order is reversed. The same also
happens for chapters and therefore the chapter parsing code already
reverses the chapters.

This commit changes this: By keeping pointers to both head and tail
of the linked list one can preserve the order of the entries and
remove the reordering code for chapters. Only the pointer to head
will be exported: No current caller uses a nonempty list, so exporting
both head and tail is unnecessary. This removes the functionality
to combine the lists of special metadata read from different ID3v2 tags,
but that doesn't make really much sense anyway (and would be trivial
to implement if desired) and allows to remove the now unnecessary
initializations performed by the callers.

The FATE-reference for the id3v2-priv test had to be updated
because the order of the tags read into the dict is reversed;
for id3v2-priv-remux only the md5 and not the ffprobe output
of the remuxed file changes because the order of the private tags
has up until now been reversed twice.

The references for the aiff/mp3 cover-art tests needed to be updated,
because the order of the attached pics is reversed upon reading.
It is still not correct, because the muxers write the pics in the order
in which they arrive at the muxer instead of the order given by
pkt->stream_index.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avformat/aiffenc: Avoid seek when writing id3v2 tags at the end

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avformat/aiffenc: Remove always-false check

write_header() already checks that there are only video tracks besides
the one audio track.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

fate/id3v2: Add test for id3v2 chapters

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

fate/id3v2: Add a test for remuxing id3v2 private tags

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

fate/cover-art: Add test for writing id3v2 tags and apic with AIFF/MP3

Notice that the order of the APIC tracks is currently wrong. This is
a superposition of two bugs: (i) Both muxers write the attached
pictures in the order they arrive in the muxer and not in the
stream_index order, leading to attached pictures that are copied being
written earlier because their timestamp is AV_NOPTS_VALUE, whereas the
timestamp of the encoded pictures is 0. (ii) A bug in the id3v2 parsing
code reverses the order of the parsed pictures.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

fate/mov: Add test for muxing cover images

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

fate/filter-video: Remove SAMPLES depedency from refcmp tests

They don't need it as they use the lavfi device to create their samples.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avcodec/exr: increase vlc depth

Fixes: shift exponent -4 is negative
Fixes: 32265/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_EXR_fuzzer-465133454137753
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

avcodec/dpx: Check bits_per_color earlier

Fixes: shift exponent 251 is too large for 32-bit type 'int'
Fixes: 32147/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DPX_fuzzer-5519111675314176
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

lavu/detection_bboxes: add missing space

Could at least maintainers with push access follow the code styles
we have?

lavfi: add filter dnn_detect for object detection

Below are the example steps to do object detection:

1. download and install l_openvino_toolkit_p_2021.1.110.tgz from
https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit/download.html
  or, we can get source code (tag 2021.1), build and install.
2. export LD_LIBRARY_PATH with openvino settings, for example:
.../deployment_tools/inference_engine/lib/intel64/:.../deployment_tools/inference_engine/external/tbb/lib/
3. rebuild ffmpeg from source code with configure option:
--enable-libopenvino
--extra-cflags='-I.../deployment_tools/inference_engine/include/'
--extra-ldflags='-L.../deployment_tools/inference_engine/lib/intel64'
4. download model files and test image
wget https://github.com/guoyejun/ffmpeg_dnn/raw/main/models/openvino/2021.1/face-detection-adas-0001.bin
wget https://github.com/guoyejun/ffmpeg_dnn/raw/main/models/openvino/2021.1/face-detection-adas-0001.xml
wget
https://github.com/guoyejun/ffmpeg_dnn/raw/main/models/openvino/2021.1/face-detection-adas-0001.label
wget https://github.com/guoyejun/ffmpeg_dnn/raw/main/images/cici.jpg
5. run ffmpeg with:
./ffmpeg -i cici.jpg -vf dnn_detect=dnn_backend=openvino:model=face-detection-adas-0001.xml:input=data:output=detection_out:confidence=0.6:labels=face-detection-adas-0001.label,showinfo -f null -

We'll see the detect result as below:
[Parsed_showinfo_1 @ 0x560c21ecbe40]   side data - detection bounding boxes:
[Parsed_showinfo_1 @ 0x560c21ecbe40] source: face-detection-adas-0001.xml
[Parsed_showinfo_1 @ 0x560c21ecbe40] index: 0,  region: (1005, 813) -> (1086, 905), label: face, confidence: 10000/10000.
[Parsed_showinfo_1 @ 0x560c21ecbe40] index: 1,  region: (888, 839) -> (967, 926), label: face, confidence: 6917/10000.

There are two faces detected with confidence 100% and 69.17%.

Signed-off-by: Guo, Yejun <yejun.guo@intel.com>

lavfi: show side data of detection bounding boxes

lavu: add side data AV_FRAME_DATA_DETECTION_BBOXES for object detection/classification

avcodec/libaomdec: export frame pict_type

Should fix ticket #9180

Signed-off-by: James Almer <jamrial@gmail.com>

avformat/mpegts: set correct extradata size for Opus streams

map_type 0 is always 19 bytes, whereas map_type 1 and 255 are 21 + channel
count bytes.

Signed-off-by: James Almer <jamrial@gmail.com>

avformat/mpegts: add missing sample_rate value to Opus extradata

Finishes fixing ticket #9190.

Signed-off-by: James Almer <jamrial@gmail.com>

avformat/movenc: fix writing dOps atoms

Don't blindly copy all bytes in extradata past ChannelMappingFamily. Instead
check if ChannelMappingFamily is not 0 and then only write the correct amount
of bytes from ChannelMappingTable, as defined in the spec[1].

Fixes part of ticket #9190.

[1] https://opus-codec.org/docs/opus_in_isobmff.html#4.3.2

Signed-off-by: James Almer <jamrial@gmail.com>

avformat/webpenc: don't assume animated webp streams will have more than one packet

The libwebp_animencoder returns a single packet with the entire animated
stream, as that's what the external library produces. As such, only ensure the
stream was produced by said encoder (or propagated by a demuxer, once support
is added) when attempting to write the requested loop value.

Fixes ticket #9179.

Signed-off-by: James Almer <jamrial@gmail.com>

avcodec/libwebpenc_animencoder: set the correct packet pts

The only packet produced by this encoder contains the entire animated stream,
so set its pts to the first frame encoded.

Signed-off-by: James Almer <jamrial@gmail.com>

avcodec/libwebpenc_animencoder: stop propagating bogus empty packets

Packets must have at least one of data or side_data. If none are available,
then got_packet must not be signaled.

The generic encode code already discarded these empty packets, but it's better
just not propagating them at all.

Signed-off-by: James Almer <jamrial@gmail.com>

doc/ffprobe.xsd: Clean-up choice indicator definitions

Remove the unneeded wrapping sequence element. Also the
minOccurs/maxOccurs occurrence indicators on the inner element
definitions can be removed.

Signed-off-by: Tobias Rapp <t.rapp@noa-archive.com>

fftools/ffprobe: Remove check on show_frames and show_packets in XML writer

The "packets_and_frames" element has been added to ffprobe.xsd in
0c9f0da0f7656059e9bd41931d250aafddf35ea3 but apparently removing the
check in ffprobe.c has been forgotten.

Signed-off-by: Tobias Rapp <t.rapp@noa-archive.com>

avformat/rawenc: remove singlejpeg muxer

It was added in 51ac1f616f due to ticket #4218, in order to show a single
image via ffserver. With ffserver long gone, it serves no purpose.

avcodec/nellymoserenc: Fix segfault when using unsupported channels/rate

NellyMoserEncodeContext.avctx is only set in init after these checks,
yet it is used by encode_end().
This is a regression since 0a56bfa71f751a2b25da8d060a019c1c75ca9d7b.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

mov: Prioritize aspect ratio values found in pasp atom

From the ISO/IEC specification for MP4:
  The pixel aspect ratio and clean aperture of the video may be specified
  using the ‘pasp’ and ‘clap’ sample entry boxes, respectively. These are
  both optional; if present, they over-ride the declarations (if any) in
  structures specific to the video codec, which structures should be
  examined if these boxes are absent. For maximum compatibility, these
  boxes should follow, not precede, any boxes defined in or required by
  derived specifications.

Fixes trac/#7277.

avcodec/mpeg4videodec: update exported AVOptions in the user-facing context

This prevents bogus values being reported on frame multithreaded decoding
scenarios.

Signed-off-by: James Almer <jamrial@gmail.com>

avcodec/h264dec: update exported AVOptions in the user-facing context

Based on a patch by Hendrik Leppkes.

Fixes ticket #9176.

Signed-off-by: James Almer <jamrial@gmail.com>

pthread_frame: introduce a codec callback to update the user-facing context

ffprobe: only print exported private decoder options

Signed-off-by: James Almer <jamrial@gmail.com>

avcodec/h264dec: add missing flags to is_avc and nal_length_size AVOptions

Signed-off-by: James Almer <jamrial@gmail.com>

aarch64: h264pred: Optimize the inner loop of existing 8 bit functions

Move the loop counter decrement further from the branch instruction,
this hides the latency of the decrement.

In loops that first load, then store (the horizontal prediction cases),
do the decrement after the load (where the next instruction would
stall a bit anyway, waiting for the result of the load).

In loops that store twice using the same destination register,
also do the decrement between the two stores (as the second store
would need to wait for the updated destination register from the
first instruction).

In loops that store twice to two different destination registers,
do the decrement before both stores, to do it as soon before the
branch as possible.

This gives minor (1-2 cycle) speedups in most cases (modulo measurement
noise), but the horizontal prediction functions get a rather notable
speedup on the Cortex A53.

Before:                     Cortex A53     A72     A73
pred8x8_dc_8_neon:                60.7    46.2    39.2
pred8x8_dc_128_8_neon:            30.7    18.0    14.0
pred8x8_horizontal_8_neon:        42.2    29.2    18.5
pred8x8_left_dc_8_neon:           52.7    36.2    32.2
pred8x8_mad_cow_dc_0l0_8_neon:    48.2    27.7    25.7
pred8x8_mad_cow_dc_0lt_8_neon:    52.5    33.2    34.7
pred8x8_mad_cow_dc_l0t_8_neon:    52.5    31.7    33.2
pred8x8_mad_cow_dc_l00_8_neon:    43.2    27.0    25.5
pred8x8_plane_8_neon:            112.2    86.2    88.2
pred8x8_top_dc_8_neon:            40.7    23.0    21.2
pred8x8_vertical_8_neon:          27.2    15.5    14.0
pred16x16_dc_8_neon:              91.0    73.2    70.5
pred16x16_dc_128_8_neon:          43.0    34.7    30.7
pred16x16_horizontal_8_neon:      86.0    49.7    44.7
pred16x16_left_dc_8_neon:         87.0    67.2    67.5
pred16x16_plane_8_neon:          236.0   175.7   173.0
pred16x16_top_dc_8_neon:          53.2    39.0    41.7
pred16x16_vertical_8_neon:        41.7    29.7    31.0

After:
pred8x8_dc_8_neon:                59.0    46.7    42.5
pred8x8_dc_128_8_neon:            28.2    18.0    14.0
pred8x8_horizontal_8_neon:        34.2    29.2    18.5
pred8x8_left_dc_8_neon:           51.0    38.2    32.7
pred8x8_mad_cow_dc_0l0_8_neon:    46.7    28.2    26.2
pred8x8_mad_cow_dc_0lt_8_neon:    55.2    33.7    37.5
pred8x8_mad_cow_dc_l0t_8_neon:    51.2    31.7    37.2
pred8x8_mad_cow_dc_l00_8_neon:    41.7    27.5    26.0
pred8x8_plane_8_neon:            111.5    86.5    89.5
pred8x8_top_dc_8_neon:            39.0    23.2    21.0
pred8x8_vertical_8_neon:          27.2    16.0    14.0
pred16x16_dc_8_neon:              85.0    70.2    70.5
pred16x16_dc_128_8_neon:          42.0    30.0    30.7
pred16x16_horizontal_8_neon:      66.5    49.5    42.5
pred16x16_left_dc_8_neon:         81.0    66.5    67.5
pred16x16_plane_8_neon:          235.0   175.7   173.0
pred16x16_top_dc_8_neon:          52.0    39.0    41.7
pred16x16_vertical_8_neon:        40.2    33.2    31.0

Despite this, a number of these functions still are slower than
what e.g. GCC 7 generates - this shows the relative speedup of the
neon codepaths over the compiler generated ones:

                           Cortex A53    A72    A73
pred8x8_dc_8_neon:               0.86   0.65   1.04
pred8x8_dc_128_8_neon:           0.59   0.44   0.62
pred8x8_horizontal_8_neon:       1.51   0.58   1.30
pred8x8_left_dc_8_neon:          0.72   0.56   0.89
pred8x8_mad_cow_dc_0l0_8_neon:   0.93   0.93   1.37
pred8x8_mad_cow_dc_0lt_8_neon:   1.37   1.41   1.68
pred8x8_mad_cow_dc_l0t_8_neon:   1.21   1.17   1.32
pred8x8_mad_cow_dc_l00_8_neon:   1.24   1.19   1.60
pred8x8_plane_8_neon:            3.36   3.58   3.76
pred8x8_top_dc_8_neon:           0.97   0.99   1.43
pred8x8_vertical_8_neon:         0.86   0.78   1.18
pred16x16_dc_8_neon:             1.20   1.06   1.49
pred16x16_dc_128_8_neon:         0.83   0.95   0.99
pred16x16_horizontal_8_neon:     1.78   0.96   1.59
pred16x16_left_dc_8_neon:        1.06   0.96   1.32
pred16x16_plane_8_neon:          5.78   6.49   7.19
pred16x16_top_dc_8_neon:         1.48   1.53   1.94
pred16x16_vertical_8_neon:       1.39   1.34   1.98

In particular, on Cortex A72, many of these functions are slower
than the compiler generated code, while they're more beneficial on
e.g. the Cortex A73.

Signed-off-by: Martin Storsjö <martin@martin.st>

avformat/dashdec: Also fetch final partial segment

Currently, the DASH demuxer omits the final segment for a non-live
stream (using SegmentTemplate) if it is shorter than the other segments.

Correct calc_max_seg_no to round up when calulating the number of
segments instead of rounding down to resolve this issue.

Signed-off-by: Matt Robinson <git@nerdoftheherd.com>

fftools/ffmpeg_filter: Fix check for mjpeg encoder

The MJPEG encoder supports some pixel format/color range combinations
only when strictness is set to unofficial or less. Before commit
059fc2d9da5364627613fb3e6424079e14dbdfd3 said encoder's pix_fmts array
only included the pixel formats supported with default strictness.
When strictness was <= unofficial, fftools/ffmpeg_filter.c used
an extended list of pixel formats instead of the encoder's including
the pixel formats only supported when strictness <= unofficial.

Said commit turned the logic around: The encoder's pix_fmts array now
included all pixel formats and fftools/ffmpeg_filter.c instead used
a small list of all pixel formats supported when strictness is >
unofficial and the encoder's pixel formats instead. In particular,
the codec's pix_fmt is not used when strictness is normal.

This works for the mjpeg encoder; yet it did not work for other
(hardware-based) mjpeg encoders, because the check for whether one is
using the MJPEG encoder is wrong: It just checks the codec id.
So if one used strict unofficial with a hardware-accelerated MJPEG
encoder before commit 059fc2d9da53, the unofficial (non-hardware)
pixel formats of the MJPEG encoder would be used; since said commit
the codec's pixel formats are overridden at ordinary strictness
by the ordinary MJPEG pixel formats. This leads to format conversion
errors lateron which were reported in #9186.

The solution to this is to check AVCodec.name instead of its id.

Fixes ticket #9186.

Tested-by: Eoff, Ullysses A <ullysses.a.eoff@intel.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

libavdevice/gdigrab: fix capture of windows with non-ASCII titles

Properly convert the UTF-8 input string to Windows wchar, and
utilize the wchar version of FindWindow.

Signed-off-by: He Yang <1160386205@qq.com>

avcodec/jpeglsenc: Remove redundant pixel format checks

This encoder has AVCodec.pix_fmts set, so ff_encode_preinit() already
checks for this.

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avformat/matroskaenc: Put subtitles without duration into SimpleBlocks

The value zero for AVPacket.duration means that the duration is unknown,
which in practice means "play this subtitle until overridden by the next
subtitle". Yet for Matroska a BlockGroup with duration zero means
that the subtitle really has a duration zero. "Display until overridden"
is achieved by not setting a duration on the container level at all and
this is achieved by using a SimpleBlock or a BlockGroup without
duration. This commit implements this.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avformat/matroskaenc: Remove remnant of inline-timing subtitle packets

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avcodec/msmpeg4dec: Avoid duplication of VLC init code

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avcodec/vc1: Remove unused hrd fields

Unused since be3492ec7eb2dbb0748c123af911a06c125c90db.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avcodec/mss2: Remove redundant initialization of vc1 dsp

ff_vc1_init_common() already does it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avcodec/vc1: Don't pretend ff_vc1_init_common() can fail

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>

avcodec/msmpeg4enc: Remove dead code for inexistent VC-1 encoder

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avcodec/h263dec, mpeg12dec: Remove redundant writes

ff_mpv_decode_init() already sets MpegEncContext.codec_id.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avcodec/rv34: Move dsp init code to rv30/rv40

It avoids both runtime and compile-time checks.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

avformat/mvi: Check audio_data_size to be non negative

Fixes: left shift of negative value -224
Fixes: 32144/clusterfuzz-testcase-minimized-ffmpeg_dem_MVI_fuzzer-4971479323246592
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

lavfi/qsvvpp: support async depth

Async depth will allow qsv filter cache few frames, and avoid force
switch and end filter task frame by frame. This change will improve
performance for some multi-task case, for example 1:N transcode(
decode + vpp + encode) with all QSV plugins.

Performance data test on my Coffee Lake Desktop(i7-8700K) by using
the following 1:8 transcode test case improvement:
1. Fps improved from 55 to 130.
2. Render/Video usage improved from ~61%/~38% to ~100%/~70%.(Data get
from intel_gpu_top)

test CMD:
ffmpeg -v verbose -init_hw_device qsv=hw:/dev/dri/renderD128 -filter_hw_device \
hw -hwaccel qsv -hwaccel_output_format qsv -c:v h264_qsv -i 1920x1080.264 \
-vf 'vpp_qsv=w=1280:h=720:async_depth=4' -c:v h264_qsv -r:v 30 -preset 7 -g 33 -refs 2 -bf 3 -q 24 -f null - \
-vf 'vpp_qsv=w=1280:h=720:async_depth=4' -c:v h264_qsv -r:v 30 -preset 7 -g 33 -refs 2 -bf 3 -q 24 -f null - \
-vf 'vpp_qsv=w=1280:h=720:async_depth=4' -c:v h264_qsv -r:v 30 -preset 7 -g 33 -refs 2 -bf 3 -q 24 -f null - \
-vf 'vpp_qsv=w=1280:h=720:async_depth=4' -c:v h264_qsv -r:v 30 -preset 7 -g 33 -refs 2 -bf 3 -q 24 -f null - \
-vf 'vpp_qsv=w=1280:h=720:async_depth=4' -c:v h264_qsv -r:v 30 -preset 7 -g 33 -refs 2 -bf 3 -q 24 -f null - \
-vf 'vpp_qsv=w=1280:h=720:async_depth=4' -c:v h264_qsv -r:v 30 -preset 7 -g 33 -refs 2 -bf 3 -q 24 -f null - \
-vf 'vpp_qsv=w=1280:h=720:async_depth=4' -c:v h264_qsv -r:v 30 -preset 7 -g 33 -refs 2 -bf 3 -q 24 -f null -

Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Reviewed-by: Linjie Fu <linjie.justin.fu@gmail.com>
Signed-off-by: Zhong Li <zhongli_dev@126.com>

avformat/rawenc: perform stream checks for mp2 muxer

avformat/img2dec: set r_frame_rate in addition to avg_frame_rate

Apparently for various image sequences libavformat/utils.c can
calculate rather fancy r_frame_rate values, such as `186/1921`,
and since ffmpeg.c utilizes r_frame_rate for the filter chain
time base, this can quite deteriorate the output frame timing - even
though the user has requested the image sequence to be interpreted
at a specific, constant frame rate.

avfilter/overlay_cuda: check av_buffer_ref result

avfilter/vf_v360: unbreak fov_from_dfov() for (d)fisheye when width != height

Based on patch by Daniel Playfair Cal.