Fix highbd_variance 32x8 bug

As reported in BUG=aomedia:1363, there is a offending
over-reading operation in upsampled_pref_error. It turns
out that over-reading only happens when block size is
32x8, The root cause is aom_highbd_10_variance_32x8_sse2
is using aom_highbd_calc16x16var_sse2 which over-reading
8 lines below the valid input data.

This CL fixes aom_highbd_10_variance_32x8_sse2, and removed
the memset in upsampled_pref_error(which will cause encoder
slow-down).

Change-Id: I6a708d2b648c53a445e3bc837934c65abe64a634
diff --git a/aom_dsp/x86/highbd_variance_sse2.c b/aom_dsp/x86/highbd_variance_sse2.c
index 8448985..e8d8cb2 100644
--- a/aom_dsp/x86/highbd_variance_sse2.c
+++ b/aom_dsp/x86/highbd_variance_sse2.c
@@ -190,7 +190,7 @@
 #if CONFIG_EXT_PARTITION_TYPES
 VAR_FN(16, 4, 16, 6);
 VAR_FN(8, 32, 8, 8);
-VAR_FN(32, 8, 16, 8);
+VAR_FN(32, 8, 8, 8);
 VAR_FN(16, 64, 16, 10);
 VAR_FN(64, 16, 16, 10);
 #endif
diff --git a/av1/encoder/mcomp.c b/av1/encoder/mcomp.c
index f3dcc70..3ddda27 100644
--- a/av1/encoder/mcomp.c
+++ b/av1/encoder/mcomp.c
@@ -668,7 +668,7 @@
                                 unsigned int *sse) {
   unsigned int besterr;
   if (xd->cur_buf->flags & YV12_FLAG_HIGHBITDEPTH) {
-    DECLARE_ALIGNED(16, uint16_t, pred16[MAX_SB_SQUARE]) = { 0 };
+    DECLARE_ALIGNED(16, uint16_t, pred16[MAX_SB_SQUARE]);
     if (second_pred != NULL) {
       if (mask) {
         aom_highbd_comp_mask_upsampled_pred(