]> git.sesse.net Git - ffmpeg/commit
aarch64: vp9itxfm: Avoid reloading the idct32 coefficients
authorMartin Storsjö <martin@martin.st>
Mon, 2 Jan 2017 20:08:41 +0000 (22:08 +0200)
committerMartin Storsjö <martin@martin.st>
Sat, 11 Mar 2017 11:14:51 +0000 (13:14 +0200)
commit2905657b902fea8718434f0d29056cf4e7434307
treea6817d0eaa15d2dd602f92e1e2aea1166bf3a1b6
parent600f4c9b03b8d39b986a00dd9dafa61be7d86a72
aarch64: vp9itxfm: Avoid reloading the idct32 coefficients

The idct32x32 function actually pushed d8-d15 onto the stack even
though it didn't clobber them; there are plenty of registers that
can be used to allow keeping all the idct coefficients in registers
without having to reload different subsets of them at different
stages in the transform.

After this, we still can skip pushing d12-d15.

Before:
vp9_inv_dct_dct_32x32_sub32_add_neon: 8128.3
After:
vp9_inv_dct_dct_32x32_sub32_add_neon: 8053.3

This is cherrypicked from libav commit
65aa002d54433154a6924dc13e498bec98451ad0.

Signed-off-by: Martin Storsjö <martin@martin.st>
libavcodec/aarch64/vp9itxfm_neon.S