The 32bits targets have been compiled with -mfpmath=sse for proper reference.
sbr_sum_square C /32bits: 82c (unrolled)/102c
C /64bits: 69c (unrolled)/82c
SSE/32bits: 42c
SSE/64bits: 31c
Use of SSE4.1 dpps to perform the final sum is slower.
Not unrolling to perform 8 operations in a loop yields 10 more cycles.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Martin Storsjö [Thu, 23 Feb 2012 09:56:15 +0000 (11:56 +0200)]
rtpenc: Expose the max packet size via an avoption
This allows opting for a lower MTU than what the AVIOContext
indicated, and allows writing into outputs that don't indicate
an MTU at all (such as plain files, which is useful for testing).
This also allows querying for the MTU via the avoption.
Diego Biurrun [Thu, 23 Feb 2012 16:17:01 +0000 (17:17 +0100)]
Remove libpostproc.
This library does not fit into Libav as a whole and its code is just a
maintenance burden. Furthermore it is now available as an external project,
which completely obviates any reason to keep it around.
Ronald S. Bultje [Wed, 22 Feb 2012 22:22:56 +0000 (14:22 -0800)]
lcl: don't overwrite input memory.
If the PNG filter is enabled, a PNG-style filter will run over the
input buffer, writing into the buffer. Therefore, if no zlib compression
was used, ensure that we copy into a temporary buffer, otherwise we
overwrite user-provided input data.
Martin Storsjö [Tue, 7 Feb 2012 14:39:14 +0000 (16:39 +0200)]
rtpenc: Allow packetizing H263 according to the old RFC 2190
According to newer RFCs, this packetization scheme should only
be used for interfacing with legacy systems.
Implementing this packetization mode properly requires parsing
the full H263 bitstream to find macroblock boundaries (and knowing
their macroblock and gob numbers and motion vector predictors).
This implementation tries to look for GOB headers (which
can be inserted by using -ps <small number>), but if the GOBs
aren't small enough to fit into the MTU, the packetizer blindly
splits packets at any offset and claims it to be a GOB boundary
(by using Mode A from the RFC). While not correct, this seems
to work with some receivers.
Rafaël Carré [Mon, 6 Feb 2012 21:08:08 +0000 (16:08 -0500)]
dxva2: don't check for DXVA_PictureParameters->wDecodedPictureIndex
This structure is well defined by Microsoft at:
http://msdn.microsoft.com/en-us/library/windows/hardware/ff564012(v=vs.85).aspx
Thus, the wDecodedPictureIndex member is guaranteed to exist.
Also, both the MPEG-2 and VC-1 hwaccel decoders depend on this struct member,
but only the VC-1 decoder was disabled if the check failed.
Ronald S. Bultje [Wed, 22 Feb 2012 19:33:24 +0000 (11:33 -0800)]
rm: prevent infinite loops for index parsing.
Specifically, prevent jumping back in the file for the next index, since
this can lead to infinite loops where we jump between indexes referring
to each other, and don't read indexes that don't fit in the file.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind CC: libav-stable@libav.org
Ronald S. Bultje [Tue, 21 Feb 2012 18:36:27 +0000 (10:36 -0800)]
rmdec: when using INT4 deinterleaving, error out if sub_packet_h <= 1.
We read sub_packet_h / 2 packets per line of data (during deinterleaving),
which equals zero if sub_packet_h <= 1, thus causing us to not read any
data, leading to an infinite loop.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind CC: libav-stable@libav.org
Martin Storsjö [Thu, 2 Feb 2012 10:50:26 +0000 (12:50 +0200)]
movenc: Buffer the mdat for the initial moov fragment, too
This allows writing QuickTime-compatible fragmented mp4 (with
a non-empty moov atom) to a non-seekable output.
This buffers the mdat for the initial fragment just as it does
for all normal fragments, too. Previously, the resulting
atom structure was mdat,moov, moof,mdat ..., while it now
is moov,mdat, moof,mdat.
Martin Storsjö [Tue, 21 Feb 2012 10:16:18 +0000 (12:16 +0200)]
movdec: Don't parse all fragments if ignidx is set
In nonseekable files, we already stop parsing the toplevel atoms
after finding moov and one mdat. In large seekable files (or files
that are seekable, but slowly, e.g. http), reading all the fragments
at the start can take a considerable amount of time. This allows
opting out from this behaviour.
Martin Storsjö [Tue, 21 Feb 2012 10:03:56 +0000 (12:03 +0200)]
movdec: Restart parsing root-level atoms at the right spot
If parsing moov+mdat in a non-seekable file, we currently
abort parsing directly after parsing the header of the mdat
atom. If we want to continue parsing later (if looking to
parse later fragments), we need to skip past the content of the
mdat atom, otherwise we end up parsing the content of the mdat
atom as root level atoms.
prores: use natural integer type for the codebook index
The operations that use it require it to be promoted to a larger (natural)
type and thus perform sign extension on it.
While an optimal compiler may account for this, gcc 4.6 (for x86 Windows)
fails. Using the natural integer type provides a 2% speedup for Win64
and 1% for Win32.
Martin Storsjö [Tue, 31 Jan 2012 10:45:02 +0000 (12:45 +0200)]
movdec: Adjust keyframe flagging in fragmented files
For video, mark the first sample in a trun which doesn't have the
sample-is-non-sync-sample flag set as a keyframe.
In particular, the "sample does not depend on other samples" flag
isn't enough to make it a keyframe, since later frames still can
reference frames prior to that one (the flag only says that that
particular frame doesn't depend on other frames).
Ronald S. Bultje [Fri, 17 Feb 2012 06:04:14 +0000 (22:04 -0800)]
rv34: change most "int stride" into "ptrdiff_t stride".
This prevents having to sign-extend on 64-bit systems with 32-bit ints,
such as x86-64. Also fixes crashes on systems where we don't do it and
arguments are not in registers, such as Win64 for all weight functions.