]> git.sesse.net Git - ffmpeg/commit
arm: vp9itxfm16: Make the larger core transforms standalone functions
authorMartin Storsjö <martin@martin.st>
Fri, 24 Feb 2017 14:02:23 +0000 (16:02 +0200)
committerMartin Storsjö <martin@martin.st>
Sun, 19 Mar 2017 20:54:19 +0000 (22:54 +0200)
commit0ea603203d1a46ea36cbaa3fb53d6fc69f5367ad
tree91f51e32233010c87a677e57b184f134185c27d8
parentb76533f105cc01f6fb64199309fab84ba22da725
arm: vp9itxfm16: Make the larger core transforms standalone functions

This work is sponsored by, and copyright, Google.

This reduces the code size of libavcodec/arm/vp9itxfm_16bpp_neon.o from
17500 to 14516 bytes.

This gives a small slowdown of a couple tens of cycles, up to around
150 cycles for the full case of the largest transform, but makes
it more feasible to add more optimized versions of these transforms.

Before:                                 Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    4237.4   3561.5   3971.8   2525.3
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   6371.9   5452.0   5779.3   3910.5
vp9_inv_dct_dct_32x32_sub4_add_10_neon:   22068.8  17867.5  19555.2  13871.6
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  37268.9  38684.2  32314.2  23969.0

After:
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    4375.1   3571.9   4283.8   2567.2
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   6415.6   5578.9   5844.6   3948.3
vp9_inv_dct_dct_32x32_sub4_add_10_neon:   22653.7  18079.7  19603.7  13905.3
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  37593.2  38862.2  32235.8  24070.9

Signed-off-by: Martin Storsjö <martin@martin.st>
libavcodec/arm/vp9itxfm_16bpp_neon.S