Full search SAD function optimization in SSE4.1
Use mpsadbw, and calculate 8 sad at once. Function list:
vp8_sad16x16x8_sse4
vp8_sad16x8x8_sse4
vp8_sad8x16x8_sse4
vp8_sad8x8x8_sse4
vp8_sad4x4x8_sse4
(test clip: tulip)
For best quality mode, this gave encoder a 5% performance boost.
For good quality mode with speed=1, this gave encoder a 3%
performance boost.
Change-Id: I083b5a39d39144f88dcbccbef95da6498e490134
diff --git a/configure b/configure
index 39ef83f..11e086e 100755
--- a/configure
+++ b/configure
@@ -199,6 +199,7 @@
sse2
sse3
ssse3
+ sse4_1
altivec
"