]> git.sesse.net Git - ffmpeg/commit
arm: vp9itxfm16: Avoid reloading the idct32 coefficients
authorMartin Storsjö <martin@martin.st>
Fri, 24 Feb 2017 22:20:25 +0000 (00:20 +0200)
committerMartin Storsjö <martin@martin.st>
Sun, 19 Mar 2017 20:53:57 +0000 (22:53 +0200)
commit32e273c111d8700dde895b80741622afc285ad3c
treef4cc534efeb7ff5e6d476c60ddf33b5e61ffafc7
parentc1619318e540a214c730c6a300ebee0a4f450ba2
arm: vp9itxfm16: Avoid reloading the idct32 coefficients

Keep the idct32 coefficients in narrow form in q6-q7, and idct16
coefficients in lengthened 32 bit form in q0-q3. Avoid clobbering
q0-q3 in the pass1 function, and squeeze the idct16 coefficients
into q0-q1 in the pass2 function to avoid reloading them.

The idct16 coefficients are clobbered and reloaded within idct32_odd
though, since that turns out to be faster than narrowing them and
swapping them into q6-q7.

Before:                            Cortex       A7        A8        A9      A53
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    22653.8   18268.4   19598.0  14079.0
vp9_inv_dct_dct_32x32_sub32_add_10_neon:   37699.0   38665.2   32542.3  24472.2
After:
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    22270.8   18159.3   19531.0  13865.0
vp9_inv_dct_dct_32x32_sub32_add_10_neon:   37523.3   37731.6   32181.7  24071.2

Signed-off-by: Martin Storsjö <martin@martin.st>
libavcodec/arm/vp9itxfm_16bpp_neon.S