Highbd intra pred H_PRED sse2 optimization
sse2 v. C speedup:
4x4 ~8.0x
8x8 ~8.2x
16x16 ~6.5x
32x32 ~3.8x
Blocksize:
4x4, 4x8, 8x4, 8x8, 8x16, 16x8, 16x16, 16x32, 32x16, 32x32
Square blocksize code is from libvpx:
"30d9a1916 vpxdsp: [x86] add highbd_h_predictor functions",
Credit goes to Scott LaVarnway. Speed tests do not support
rectangular blocksize yet.
Change-Id: I9a1f24aecab8de94f8ea59ec8748fe3537d721ae
diff --git a/aom_dsp/aom_dsp.cmake b/aom_dsp/aom_dsp.cmake
index b7f9b6b..0da4392 100644
--- a/aom_dsp/aom_dsp.cmake
+++ b/aom_dsp/aom_dsp.cmake
@@ -217,6 +217,7 @@
set(AOM_DSP_COMMON_INTRIN_SSE2
${AOM_DSP_COMMON_INTRIN_SSE2}
+ "${AOM_ROOT}/aom_dsp/x86/highbd_intrapred_sse2.c"
"${AOM_ROOT}/aom_dsp/x86/highbd_loopfilter_sse2.c")
set(AOM_DSP_COMMON_INTRIN_AVX2