Lowbd intrapred DC/TOP/LEFT/128/V/H avx2

For prediction block width equal to 32, avx2 can further speedup
the prediction function (i7-6700):

32x32     avx2 v. sse2
DC        ~1.4x
top       ~1.5x
left      ~1.4x
128       ~1.5x
v         ~1.6x
h         ~1.2x

32x16     avx2 v. sse2
DC        ~2.2x
top       ~1.7x
left      ~1.6x
128       ~1.8x
v         ~1.9x

Note: 32x16 H_PRED on avx2 does not run faster enough than sse2 yet.

Change-Id: I145ed504d1b3ea9df283b94927be66a2c6f81225
diff --git a/aom_dsp/aom_dsp.cmake b/aom_dsp/aom_dsp.cmake
index 89f294b..889f240 100644
--- a/aom_dsp/aom_dsp.cmake
+++ b/aom_dsp/aom_dsp.cmake
@@ -66,6 +66,7 @@
 
 set(AOM_DSP_COMMON_INTRIN_AVX2
     "${AOM_ROOT}/aom_dsp/x86/aom_subpixel_8t_intrin_avx2.c"
+    "${AOM_ROOT}/aom_dsp/x86/intrapred_avx2.c"
     "${AOM_ROOT}/aom_dsp/x86/inv_txfm_avx2.c"
     "${AOM_ROOT}/aom_dsp/x86/common_avx2.h"
     "${AOM_ROOT}/aom_dsp/x86/inv_txfm_common_avx2.h"