Mans Rullgard [Thu, 2 Aug 2012 22:18:08 +0000 (23:18 +0100)]
ARMv6: vp8: fix stack allocation with Apple's assembler
In the GNU assembler, a relational expression, bizarrely, has the
value -1 if true, whereas in Apple's it is +1. This patch makes
sure the correct expression is used in both cases.
Mans Rullgard [Thu, 2 Aug 2012 21:53:47 +0000 (22:53 +0100)]
ARM: vp56: allow inline asm to build with clang
The clang integrated assembler does not support pre-UAL syntax,
while gcc requires pre-UAL syntax for ARM code. A patch[1] for
clang to support the old syntax as well has been ignored since
January.
This patch chooses the syntax appropriate for each compiler,
allowing both to build the code. Notably, this change allows
building for iphone with the latest Apple Xcode update.
Diego Biurrun [Sun, 8 Jul 2012 16:42:12 +0000 (18:42 +0200)]
x86: build: replace mmx2 by mmxext
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
For left HFYU prediction, we predict from the buffer buf+1 using 8- or
16-byte reads. This means that aligning the buffer by 16 bytes is in
itself not sufficient, because if the width itself is 16- or 8-byte
aligned, the buffer will not be padded, and thus a read of size 16 at
buf+1 will overflow boundaries at the right edge. Padding the buffer by
1 byte is sufficient to not overflow its boundaries.
dsputil: make add_hfyu_left_prediction_sse4() support unaligned src.
This makes add_hfyu_left_prediction_sse4() handle sources that are not
16-byte aligned in its own function rather than by proxying the call to
add_hfyu_left_prediction_ssse3(). This fixes a crash on Win64, since the
sse4 version clobberes xmm6, but the ssse3 version (which uses MMX regs)
does not restore it, thus leading to XMM clobbering and RSP being off.
vc1dec: Invoke edge_emulation regardless of MV precision
In VC-1 interlaced field pictures, chroma motion vectors can extend beyond
picture boundary even if luma vectors are bounded. The problem shows up
only for hpel interpolated MVs, and may be due to the way motion vectors
are scaled / cropped.
Thanks to Konstantin Shishkov for suggesting the fix. This fixes
long-known segfaults in MC-VC1.ts from videolan streams archive.
Diego Biurrun [Wed, 1 Aug 2012 13:31:43 +0000 (15:31 +0200)]
x86: Use consistent 3dnowext function and macro name suffixes
Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
Kostya Shishkov [Thu, 2 Aug 2012 17:15:51 +0000 (19:15 +0200)]
g723_1: increase excitation storage by 4
Fixed codebook mode in 5300 rate may write up to SUBFRAME_LEN + 4 and
that is considered normal by the reference decoder. Without that additional
padding it might overwrite first elements of LPC history.
Some calculations were changed in b6a3849 to use mmsize, which was not correct
for the AVX version, which uses INIT_YMM and therefore has mmsize == 32.
Mans Rullgard [Wed, 1 Aug 2012 13:32:19 +0000 (14:32 +0100)]
dct-test: always link with aandcttab.o
This allows building dct-test even if aandcttab.o is not pulled in
by any enabled codec. The DCT with which these tables are used does
not use them directly, so building it without the tables is possible.
Mans Rullgard [Wed, 1 Aug 2012 13:01:08 +0000 (14:01 +0100)]
vp8: pack struct VP8ThreadData more efficiently
Reordering the members in this struct reduces the holes required
to maintain alignment. With this order, the only remaining, and
unavoidable, hole is 3 bytes following left_nnz.
Mans Rullgard [Wed, 1 Aug 2012 12:16:23 +0000 (13:16 +0100)]
x86: remove libmpeg2 mmx(ext) idct functions
These functions are not faster than other mmx implementations on
any hardware I have been able to test on, and they are horribly
inaccurate. There is thus no reason to ever use them.
Mans Rullgard [Tue, 31 Jul 2012 22:58:58 +0000 (23:58 +0100)]
ARM: use standard syntax for all LDRD/STRD instructions
The standard syntax requires two destination registers for
LDRD/STRD instructions. Some versions of the GNU assembler
allow using only one with the second implicit, others are
more strict.
Ronald S. Bultje [Sat, 28 Jul 2012 17:11:00 +0000 (10:11 -0700)]
h264: convert loop filter strength dsp function to yasm.
This completes the conversion of h264dsp to yasm; note that h264 also
uses some dsputil functions, most notably qpel. Performance-wise, the
yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles
faster (201->193) on x86-32.
Mans Rullgard [Sun, 29 Jul 2012 12:09:10 +0000 (13:09 +0100)]
eamad/eatgq/eatqi: call special EA IDCT directly
These decoders use a special non-MPEG2 IDCT. Call it directly
instead of going through dsputil. There is never any reason
to use a regular IDCT with these decoders or to use the EA IDCT
with other codecs.
This also fixes the bizarre situation of eamad and eatqi decoding
incorrectly if eatgq is disabled.
This allows for padding/trimming at the start of stream. By default, no
assumption is made about the first frame's expected pts, so no padding or
trimming is done.
Anton Khirnov [Tue, 26 Jun 2012 11:10:01 +0000 (13:10 +0200)]
lavf: deprecate r_frame_rate.
According to its description, it is supposed to be the LCM of all the
frame durations. The usability of such a thing is vanishingly small,
especially since we cannot determine it with any amount of reliability.
Therefore get rid of it after the next bump.
Replace it with the average framerate where it makes sense.
FATE results for the wtv and xmv demux tests change. In the wtv case
this is caused by the file being corrupted (or possibly badly cut) and
containing invalid timestamps. This results in lavf estimating the
framerate wrong and making up wrong frame durations.
In the xmv case the file contains pts jumps, so again the estimated
framerate is far from anything sane and lavf again makes up different
frame durations.
In some other tests lavf starts making up frame durations from different
frame.
Anton Khirnov [Thu, 28 Jun 2012 13:49:51 +0000 (15:49 +0200)]
lavf: use dts difference instead of AVPacket.duration in find_stream_info()
AVPacket.duration is mostly made up and thus completely useless, this is
especially true for video streams.
Therefore use dts difference for framerate estimation and
the max_analyze_duration check.
The asyncts test now needs -analyzeduration, because the default is 5
seconds and the audio stream in the sample appears at ~10 seconds.
Ronald S. Bultje [Sat, 28 Jul 2012 15:20:19 +0000 (08:20 -0700)]
fft: rename "z" to "zc" to prevent name collision.
Without this, cglobal will expand "z" to "zh" to access the high byte
in a register's word, which causes a name collision with the ZH(x) macro
further up in this file.