]> git.sesse.net Git - ffmpeg/commit
aarch64: vp9itxfm16: Make the larger core transforms standalone functions
authorMartin Storsjö <martin@martin.st>
Fri, 24 Feb 2017 14:10:25 +0000 (16:10 +0200)
committerMartin Storsjö <martin@martin.st>
Sun, 19 Mar 2017 20:54:26 +0000 (22:54 +0200)
commit0f2705e66b1f7f9ae900667c400e46fa0e4f15a7
tree6aae4e968bdbfc39ff0f3e9f37d7eedc9f1a80d5
parent0ea603203d1a46ea36cbaa3fb53d6fc69f5367ad
aarch64: vp9itxfm16: Make the larger core transforms standalone functions

This work is sponsored by, and copyright, Google.

This reduces the code size of libavcodec/aarch64/vp9itxfm_16bpp_neon.o from
26288 to 21512 bytes.

This gives a small slowdown of a couple of tens of cycles, but makes
it more feasible to add more optimized versions of these transforms.

Before:
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    1887.4
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   2801.5
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    9691.4
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  16154.9

After:
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    1899.5
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   2827.2
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    9714.7
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  16175.9

Signed-off-by: Martin Storsjö <martin@martin.st>
libavcodec/aarch64/vp9itxfm_16bpp_neon.S