git.sesse.net Git - ffmpeg/commit

author	Clément Bœsch <cboesch@gopro.com>
	Thu, 22 Jun 2017 09:04:26 +0000 (11:04 +0200)
committer	Clément Bœsch <u@pkh.me>
	Wed, 28 Jun 2017 09:59:34 +0000 (11:59 +0200)
commit	e4a27e2f2dea60fb0cce6e555a6a8296e50edc54
tree	627422a43dd9181b9af4fdc15461e0de093c66a2	tree \| snapshot
parent	d2ef9e6e7f9ef71aae15e9493189515a857928b1	commit \| diff

lavc/arm: fix lack of precision in ff_ps_stereo_interpolate_neon

The code originally pre-multiply by 2 the steps, causing the running sum
of the h factors to drift away due to the lack of precision. It quickly
causes an inaccuracy > 0.01.

I tried diverse approaches such as multiply by 2.0 (instead of adding
the value itself) without success.

I'm unable to bench the impact of this change, feel free to compare.

This commit fixes the incoming aacpsdsp tests.

Following is an alternative simplified function (matching the incoming
AArch64 code) that may be used:

function ff_ps_stereo_interpolate_neon, export=1
        vld1.32         {q0}, [r2]
        vld1.32         {q1}, [r3]
        ldr             r12, [sp]
        vmov.f32        q8, q0
        vmov.f32        q9, q1
        vzip.32         q8, q0
        vzip.32         q9, q1
1:
        vld1.32         {d4}, [r0,:64]
        vld1.32         {d6}, [r1,:64]
        vadd.f32        q8, q8, q9
        vadd.f32        q0, q0, q1
        vmov.f32        d5, d4
        vmov.f32        d7, d6
        vmul.f32        q2, q2, q8
        vmla.f32        q2, q3, q0
        vst1.32         {d4}, [r0,:64]!
        vst1.32         {d5}, [r1,:64]!
        subs            r12, r12, #1
        bgt             1b
        bx              lr
endfunc