git.sesse.net Git - x264/commit

author	Martin Storsjö <martin@martin.st>
	Thu, 13 Aug 2015 20:59:28 +0000 (23:59 +0300)
committer	Henrik Gramner <henrik@gramner.com>
	Sun, 11 Oct 2015 16:44:54 +0000 (18:44 +0200)
commit	89439b2c604c81e13eb3da9e692d2cdae5a18b53
tree	a1de180e9d62eecaaa87a7ec48b85d00ba9512a5	tree \| snapshot
parent	ff71457d71c5c11ed825d848677cab09c7639012	commit \| diff

arm: Optimize x264_deblock_h_chroma_neon

Shuffle both chroma components together as a 16 bit unit, and
don't write the unchanged columns (like in x264_deblock_h_luma_neon
and in the aarch64 version of the function).

This causes a minor slowdown for x264_deblock_v_chroma_neon, but
it is negligible compared to the speedup.

checkasm timing      Cortex-A7    A8    A9
deblock_chroma[1]_c         4817  4057  3601
deblock_chroma[1]_neon      1249  716   817   (before)
deblock_chroma[1]_neon      1249  766   845   (after)

deblock_h_chroma_420_c      3699  3275  2830
deblock_h_chroma_420_neon   2068  1414  1400  (before)
deblock_h_chroma_420_neon   1838  1355  1291  (after)