git.sesse.net Git - x264/commit

author	Fiona Glaser <fiona@x264.com>
	Fri, 21 Nov 2008 11:39:11 +0000 (03:39 -0800)
committer	Fiona Glaser <fiona@x264.com>
	Sun, 23 Nov 2008 03:36:03 +0000 (19:36 -0800)
commit	80ea99c001eaab58a0ff54f0b2c4815cb2e63076
tree	353a8113dbba307b762cb6104196f054bee1ebb6	tree \| snapshot
parent	7df060bedbc72232fdf48869cea47bcd480e8eda	commit \| diff

Phenom CPU optimizations
Faster hpel_filter by using unaligned loads instead of emulated PALIGNR
Faster hpel_filter on 64-bit by using the 32-bit version (the cost of emulated PALIGNR is high enough that the savings from caching intermediate values is not worth it).
Add support for misaligned_mask on Phenom: ~2% faster hpel_filter, ~4% faster width16 multisad, 7% faster width20 get_ref.
Replace width12 mmx with width16 sse on Phenom and Nehalem: 32% faster width12 get_ref on Phenom.
Merge cpu-32.asm and cpu-64.asm
Thanks to Easy123 for contributing a Phenom box for a weekend so I could write these optimizations.

Makefile		diff \| blob \| history
common/cpu.c		diff \| blob \| history
common/pixel.c		diff \| blob \| history
common/x86/cpu-64.asm	[deleted file]	blob \| history
common/x86/cpu-a.asm	[moved from common/x86/cpu-32.asm with 76% similarity]	diff \| blob \| history
common/x86/mc-a.asm		diff \| blob \| history
common/x86/mc-a2.asm		diff \| blob \| history
common/x86/mc-c.c		diff \| blob \| history
common/x86/pixel.h		diff \| blob \| history
common/x86/sad-a.asm		diff \| blob \| history
tools/checkasm.c		diff \| blob \| history
x264.h		diff \| blob \| history