]> git.sesse.net Git - x264/commit
Phenom CPU optimizations
authorFiona Glaser <fiona@x264.com>
Fri, 21 Nov 2008 11:39:11 +0000 (03:39 -0800)
committerFiona Glaser <fiona@x264.com>
Sun, 23 Nov 2008 03:36:03 +0000 (19:36 -0800)
commit80ea99c001eaab58a0ff54f0b2c4815cb2e63076
tree353a8113dbba307b762cb6104196f054bee1ebb6
parent7df060bedbc72232fdf48869cea47bcd480e8eda
Phenom CPU optimizations
Faster hpel_filter by using unaligned loads instead of emulated PALIGNR
Faster hpel_filter on 64-bit by using the 32-bit version (the cost of emulated PALIGNR is high enough that the savings from caching intermediate values is not worth it).
Add support for misaligned_mask on Phenom: ~2% faster hpel_filter, ~4% faster width16 multisad, 7% faster width20 get_ref.
Replace width12 mmx with width16 sse on Phenom and Nehalem: 32% faster width12 get_ref on Phenom.
Merge cpu-32.asm and cpu-64.asm
Thanks to Easy123 for contributing a Phenom box for a weekend so I could write these optimizations.
12 files changed:
Makefile
common/cpu.c
common/pixel.c
common/x86/cpu-64.asm [deleted file]
common/x86/cpu-a.asm [moved from common/x86/cpu-32.asm with 76% similarity]
common/x86/mc-a.asm
common/x86/mc-a2.asm
common/x86/mc-c.c
common/x86/pixel.h
common/x86/sad-a.asm
tools/checkasm.c
x264.h