Optimize aom_highbd_quantize_b_64x64 module

Added SSE2 variant for aom_highbd_quantize_b_64x64_c

When tested for multiple test cases observed 0.3%
average reduction in encoder time for speed = 4 preset.

Module gains improved by factor of ~1.5x w.r.t C code.

Change-Id: If17d4cd853f4db4d3caa2d1977939d441961558a
diff --git a/test/quantize_func_test.cc b/test/quantize_func_test.cc
index 2975c1d..e7afed9 100644
--- a/test/quantize_func_test.cc
+++ b/test/quantize_func_test.cc
@@ -398,6 +398,12 @@
              TX_32X32, TYPE_B, AOM_BITS_10),
   make_tuple(&aom_highbd_quantize_b_32x32_c, &aom_highbd_quantize_b_32x32_sse2,
              TX_32X32, TYPE_B, AOM_BITS_12),
+  make_tuple(&aom_highbd_quantize_b_64x64_c, &aom_highbd_quantize_b_64x64_sse2,
+             TX_64X64, TYPE_B, AOM_BITS_8),
+  make_tuple(&aom_highbd_quantize_b_64x64_c, &aom_highbd_quantize_b_64x64_sse2,
+             TX_64X64, TYPE_B, AOM_BITS_10),
+  make_tuple(&aom_highbd_quantize_b_64x64_c, &aom_highbd_quantize_b_64x64_sse2,
+             TX_64X64, TYPE_B, AOM_BITS_12)
 };
 
 INSTANTIATE_TEST_CASE_P(SSE2, QuantizeTest,