SSE2 implementation of 4-tap filter
Added SSE2 implementation of aom_filter_block1d16_h4_sse2
and aom_filter_block1d16_v4_sse2 for block width >= 16.
Approximately 24% improvement is seen w.r.t 8-tap filter
at unit test level.
Change-Id: I79d287f40de1828e8a0d9a9e309c630ea1b2b67d
diff --git a/aom_dsp/aom_dsp.cmake b/aom_dsp/aom_dsp.cmake
index 11ff737..1d681e6 100644
--- a/aom_dsp/aom_dsp.cmake
+++ b/aom_dsp/aom_dsp.cmake
@@ -56,6 +56,7 @@
"${AOM_ROOT}/aom_dsp/x86/inv_wht_sse2.asm")
list(APPEND AOM_DSP_COMMON_INTRIN_SSE2
+ "${AOM_ROOT}/aom_dsp/x86/aom_subpixel_8t_intrin_sse2.c"
"${AOM_ROOT}/aom_dsp/x86/aom_asm_stubs.c"
"${AOM_ROOT}/aom_dsp/x86/convolve.h"
"${AOM_ROOT}/aom_dsp/x86/convolve_sse2.h"