avcodec/msmpeg4: Factor out common RLTable initialization code
Up until now, both the msmpeg4 decoders and encoders initialized several
RLTables common to them (the decoders also initialized the VLCs of these
RLTables). This is an obstacle to making these codecs init-threadsafe.
So move this initialization to ff_msmpeg4_common_init() that already
contains this initialization code. This allows to reuse the AVOnce used
for initializing ff_v2_dc_lum/chroma_table which automatically makes
initializing these RLTables thread-safe.
Reviewed-by: Anton Khirnov <anton@khirnov.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
avcodec/mpeg4video: Make initializing RLTable thread-safe
Up until now the RLTable ff_mpeg4_rl_intra was initialized by both mpeg4
decoder and encoder (except the VLCs that are only used by the decoder).
This is an obstacle to making these codecs init-threadsafe, so move
initializing this to a single function that is guarded by a dedicated
AVOnce.
Reviewed-by: Anton Khirnov <anton@khirnov.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
avcodec/ituh263enc: Make static initializations thread-safe
This already makes several encoders (namely FLV, H.263, H.263+ and
RealVideo 1.0 and 2.0 and SVQ1) that use this init-threadsafe.
It also makes the Snow encoder init-threadsafe; it was already marked
as such since commit d49210788b0836d56dd872d517fe73f83b080101, because
it was thought to be harmless if one and the same object was
initialized by multiple threads at the same time.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
avcodec/encode: Zero padding in ff_get_encode_buffer()
The documentation of the get_encode_buffer() callback does not require
to zero the padding; therefore we do it in ff_get_encode_buffer().
This also constitutes an implicit check for whether the buffer is
actually allocated with padding.
The memset in avcodec_default_get_encode_buffer() is now redundant and
has been removed.
Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Jan Ekström [Mon, 29 Mar 2021 14:19:58 +0000 (17:19 +0300)]
avcodec/ttmlenc: add support for region positioning and sizing
The ASS margins are utilized to generate percentual values, as
the usage of cell-based sizing and offsetting seems to be not too
well supported by renderers.
Jan Ekström [Mon, 29 Mar 2021 13:34:34 +0000 (16:34 +0300)]
avcodec/ttmlenc: add initial support for regions and styles
Attempts to utilize the TTML cell resolution as a mapping to the
reference resolution, and maps font size to cell size. Additionally
sets the display and text alignment according to the ASS alignment
number.
Jan Ekström [Mon, 15 Mar 2021 14:01:52 +0000 (16:01 +0200)]
avformat/ttmlenc: enable writing out additional header values
This way the encoder may pass on the following values to the muxer:
1) Additional root "tt" element attributes, such as the subtitle
canvas reference size.
2) Anything before the body element of the document, such as regions
in the head element, which can configure styles.
James Almer [Wed, 21 Apr 2021 16:33:33 +0000 (13:33 -0300)]
avcodec/mjpegdec: postpone calling ff_get_buffer() until the SOS marker
With JPEG-LS PAL8 samples, the JPEG-LS extension parameters signaled with
the LSE marker show up after SOF but before SOS. For those, the pixel format
chosen by get_format() in SOF is GRAY8, and then replaced by PAL8 in LSE.
This has not been an issue given both pixel formats allocate the second data
plane for the palette, but after the upcoming soname bump, GRAY8 will no longer
do that. This will result in segfauls when ff_jpegls_decode_lse() attempts to
write the palette on a buffer originally allocated as a GRAY8 one.
Work around this by calling ff_get_buffer() after the actual pixel format is
known.
This commit adds a pure x86 assembly SIMD version of the FFT in libavutil/tx.
The design of this pure assembly FFT is pretty unconventional.
On the lowest level, instead of splitting the complex numbers into
real and imaginary parts, we keep complex numbers together but split
them in terms of parity. This saves a number of shuffles in each transform,
but more importantly, it splits each transform into two independent
paths, which we process using separate registers in parallel.
This allows us to keep all units saturated and lets us use all available
registers to avoid dependencies.
Moreover, it allows us to double the granularity of our per-load permutation,
skipping many expensive lookups and allowing us to use just 4 loads per register,
rather than 8, or in case FMA3 (and by extension, AVX2), use the vgatherdpd
instruction, which is at least as fast as 4 separate loads on old hardware,
and quite a bit faster on modern CPUs).
Higher up, we go for a bottom-up construction of large transforms, foregoing
the traditional per-transform call-return recursion chains. Instead, we always
start at the bottom-most basis transform (in this case, a 32-point transform),
and continue constructing larger and larger transforms until we return to the
top-most transform.
This way, we only touch the stack 3 times per a complete target transform:
once for the 1/2 length transform and two times for the 1/4 length transform.
The combination algorithm we use is a standard Split-Radix algorithm,
as used in our C code. Although a version with less operations exists
(Steven G. Johnson and Matteo Frigo's "A modified split-radix FFT with fewer
arithmetic operations", IEEE Trans. Signal Process. 55 (1), 111–119 (2007),
which is the one FFTW uses), it only has 2% less operations and requires at least 4x
the binary code (due to it needing 4 different paths to do a single transform).
That version also has other issues which prevent it from being implemented
with SIMD code as efficiently, which makes it lose the marginal gains it offered,
and cannot be performed bottom-up, requiring many recursive call-return chains,
whose overhead adds up.
We go through a lot of effort to minimize load/stores by keeping as much in
registers in between construcring transforms. This saves us around 32 cycles,
on paper, but in reality a lot more due to load/store aliasing (a load from a
memory location cannot be issued while there's a store pending, and there are
only so many (2 for Zen 3) load/store units in a CPU).
Also, we interleave coefficients during the last stage to save on a store+load
per register.
Each of the smallest, basis transforms (4, 8 and 16-point in our case)
has been extremely optimized. Our 8-point transform is barely 20 instructions
in total, beating our old implementation 8-point transform by 1 instruction.
Our 2x8-point transform is 23 instructions, beating our old implementation by
6 instruction and needing 50% less cycles. Our 16-point transform's combination
code takes slightly more instructions than our old implementation, but makes up
for it by requiring a lot less arithmetic operations.
Overall, the transform was optimized for the timings of Zen 3, which at the
time of writing has the most IPC from all documented CPUs. Shuffles were
preferred over arithmetic operations due to their 1/0.5 latency/throughput.
On average, this code is 30% faster than our old libavcodec implementation.
It's able to trade blows with the previously-untouchable FFTW on small transforms,
and due to its tiny size and better prediction, outdoes FFTW on larger transforms
by 11% on the largest currently supported size.
This sadly required making changes to the code itself,
due to the same context needing to be reused for both versions.
The lookup table had to be duplicated for both versions.
This commit refactors the power-of-two FFT, making it faster and
halving the size of all tables, making the code much smaller on
all systems.
This removes the big/small pass split, because on modern systems
the "big" pass is always faster, and even on older machines there
is no measurable speed difference.
avcodec/avcodec: Actually honour the documentation of subtitle_header
It is only supposed to be freed by libavcodec for decoders, yet
avcodec_open2() always frees it on failure.
Furthermore, avcodec_close() doesn't free it for decoders.
Both of this has been changed.
Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
avformat/id3v2: Check end for overflow in id3v2_parse()
Fixes: signed integer overflow: 9223372036840103978 + 67637280 cannot be represented in type 'long' Fixes: 33341/clusterfuzz-testcase-minimized-ffmpeg_dem_DSF_fuzzer-6408154041679872 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: 9223372036854775805 + 4 cannot be represented in type 'long' Fixes: 29927/clusterfuzz-testcase-minimized-ffmpeg_dem_MXF_fuzzer-5579985228267520 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
avformat/wtvdec: Improve size overflow checks in parse_chunks()
Fixes: signed integer overflow: 32 + 2147483647 cannot be represented in type 'int Fixes: 32967/clusterfuzz-testcase-minimized-ffmpeg_dem_WTV_fuzzer-5132856218222592 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Reviewed-by: Peter Ross <pross@xvid.org> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
avformat/mov: check for pts overflow in mov_read_sidx()
Fixes: signed integer overflow: 9223372036846336888 + 4278255871 cannot be represented in type 'long' Fixes: 32782/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-6059216516284416 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
avcodec/jpeglsdec: Don't presume the context to contain a JLSState
Before 9b3c46a081a9f01559082bf7a154fc6be1e06c18 every call to
ff_jpegls_decode_picture() allocated and freed a JLSState. This commit
instead put said structure into the context of the JPEG-LS decoder to
avoid said allocation. But said function can also be called from other
MJPEG-based decoders and their contexts doesn't contain said structure,
leading to segfaults. This commit fixes this: The JLSState is now
allocated on the first call to ff_jpegls_decode_picture() and stored in
the context.
Found-by: Michael Niedermayer <michael@niedermayer.cc> Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
avcodec/utils: Check ima wav duration for overflow
Fixes: signed integer overflow: 44331634 * 65 cannot be represented in type 'int' Fixes: 32120/clusterfuzz-testcase-minimized-ffmpeg_dem_RSD_fuzzer-5760221223583744 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: -1184429040541376544 * 32 cannot be represented in type 'long' Fixes: 31788/clusterfuzz-testcase-minimized-ffmpeg_dem_CAF_fuzzer-6236746338664448 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Some files currently rely on libavutil/cpu.h to include it for them;
yet said file won't use include it any more after the currently
deprecated functions are removed, so include attributes.h directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
doc/muxers.texi: fix build issue for unknown command
The build log:
** Unknown command `@code' (left as is) (in src/doc/muxers.texi l. 2020)
*** '{' without macro. Before: -map} option with the ffmpeg CLI tool. (in src/doc/muxers.texi l. 2020)
*** '}' without opening '{' before: option with the ffmpeg CLI tool. (in src/doc/muxers.texi l. 2020)
Relying on the order of the enum is bad.
It clashes with the new presets having to sit at the end of the list, so
that they can be properly filtered out by the options parser on builds
with older SDKs.
So this refactors nvenc.c to instead rely on the internal NVENC_LOSSLESS
flag. For this, the preset mapping has to happen much earlier, so it's
moved from nvenc_setup_encoder to nvenc_setup_device and thus runs
before the device capability check.
This would only make a difference in case the first attempt to
initialize the encoder failed and the second succeeded. The only
reason I can think of for this to happen is that the options (in
particular the codec whitelist) are not used for the second try
and that obviously implies that we should not even try a second time
to open the decoder.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
avformat/matroskaenc: Remove unnecessary function calls
ffio_fill() is used when initially writing unknown length elements;
yet it can happen that the amount of bytes written by it is zero in
which case it is of course unnecessary to ever call it. Whether it is
possible to know this during compiletime depends upon how aggressively
the compiler inlines function calls (i.e. if it inlines calls to
start_ebml_master() where the upper bound for the size of the element
implies that the size will be written on one byte) and this depends upon
optimization settings. It is not the aim of this patch to inline all
calls where it is known that ffio_fill() will be unnecessary, but merely
to make compilers that inline such calls aware of the fact that writing
zero bytes with ffio_fill() is unnecessary. To this end
av_builtin_constant_p() is used to check whether the size is a
compiletime constant.
For GCC 10 this made a difference at -O3 only: The size of .text
decreased from 0x747F (with 29 calls to ffio_fill(), eight of which
use size zero) to 0x7337 (with 21 calls to ffio_fill(), zero of which
use size zero).
For Clang 11 it made a difference at -O2 and -O3: At -O2, the size of
.text decreased from 0x879C to 0x871C (with eight calls to ffio_fill()
eliminated); at -O3 the size of .text decreased from 0xAF2F to 0xAEBF.
Once again, eight calls to ffio_fill() with size zero have been
eliminated.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
avformat/wavdec: Fix reading files with id3v2 apic before fmt tag
Up until now the cover images will get the stream index 0 in this case,
violating the hardcoded assumption that this is the index of the audio
stream. Fix this by creating the audio stream first; this is also in
line with the expectations of ff_pcm_read_seek() and
ff_spdif_read_packet(). It also simplifies the code to parse the fmt and
xma2 tags.
avformat/id3v2: Don't reverse the order of id3v2 APICs
When parsing ID3v2 tags, special (non-text) metadata is not applied
directly and unconditionally; instead it is stored in a linked list
in which elements are prepended. When traversing the list to add APICs
(or private tags) at the end, the order is reversed. The same also
happens for chapters and therefore the chapter parsing code already
reverses the chapters.
This commit changes this: By keeping pointers to both head and tail
of the linked list one can preserve the order of the entries and
remove the reordering code for chapters. Only the pointer to head
will be exported: No current caller uses a nonempty list, so exporting
both head and tail is unnecessary. This removes the functionality
to combine the lists of special metadata read from different ID3v2 tags,
but that doesn't make really much sense anyway (and would be trivial
to implement if desired) and allows to remove the now unnecessary
initializations performed by the callers.
The FATE-reference for the id3v2-priv test had to be updated
because the order of the tags read into the dict is reversed;
for id3v2-priv-remux only the md5 and not the ffprobe output
of the remuxed file changes because the order of the private tags
has up until now been reversed twice.
The references for the aiff/mp3 cover-art tests needed to be updated,
because the order of the attached pics is reversed upon reading.
It is still not correct, because the muxers write the pics in the order
in which they arrive at the muxer instead of the order given by
pkt->stream_index.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
fate/cover-art: Add test for writing id3v2 tags and apic with AIFF/MP3
Notice that the order of the APIC tracks is currently wrong. This is
a superposition of two bugs: (i) Both muxers write the attached
pictures in the order they arrive in the muxer and not in the
stream_index order, leading to attached pictures that are copied being
written earlier because their timestamp is AV_NOPTS_VALUE, whereas the
timestamp of the encoded pictures is 0. (ii) A bug in the id3v2 parsing
code reverses the order of the parsed pictures.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: shift exponent 251 is too large for 32-bit type 'int' Fixes: 32147/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DPX_fuzzer-5519111675314176 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>