]> git.sesse.net Git - x264/commit
Faster width4 SSD+SATD, SSE4 optimizations
authorFiona Glaser <fiona@x264.com>
Tue, 25 Nov 2008 09:04:26 +0000 (01:04 -0800)
committerFiona Glaser <fiona@x264.com>
Tue, 25 Nov 2008 23:27:25 +0000 (15:27 -0800)
commit69e69197c424bff9e4b90eb5d608f15b59ca77b4
treeec952f7b2802998aecc9325e8b8b7e7ea3ef8c47
parente76caf368c7044fdd1eff6a423d9518e9818a4ba
Faster width4 SSD+SATD, SSE4 optimizations
Do satd 4x8 by transposing the two blocks' positions and running satd 8x4.
Use pinsrd (SSE4) for faster width4 SSD
Globally replace movlhps with punpcklqdq (it seems to be faster on Conroe)
Move mask_misalign declaration to cpu.h to avoid warning in encoder.c.
These optimizations help on Nehalem, Phenom, and Penryn CPUs.
common/cpu.c
common/cpu.h
common/pixel.c
common/x86/dct-a.asm
common/x86/deblock-a.asm
common/x86/mc-a.asm
common/x86/pixel-a.asm
common/x86/pixel.h
common/x86/x86util.asm
tools/checkasm.c