Add SSE4_2 version of crc hash function

1. Add SSE4_2 detection to rtcd
2. Add av1_get_crc_value_sse4_2 and unittest AV1CrcHashTest
3. av1_get_crc_value_sse4_2 is crc32, which is longer than the C version
So, the hash result of sse4_2 and C is not the same, but should be
bitwise identical for the encoder result.
4. The speed test in AV1CrcHashTest shows SSE4_2 version is 10x ~ 50x
faster than C version.
hash  64x64 :1906883.00/75701.00ns(25.19)
hash  32x32 :922948.00/38389.00ns(24.04)
hash   8x8  :234861.00/4615.00ns(50.89)
hash   4x4  :107561.00/9238.00ns(11.64)
5. For encoder, about 2% faster shows by encoding 20 frames foreman_cif.y4m.

Change-Id: I1d3272cdb94733ac55a0f9affbb1faac3fdc78d1
diff --git a/av1/common/av1_rtcd_defs.pl b/av1/common/av1_rtcd_defs.pl
index b4960d8..43742ac 100755
--- a/av1/common/av1_rtcd_defs.pl
+++ b/av1/common/av1_rtcd_defs.pl
@@ -441,6 +441,10 @@
   add_proto qw/void av1_wedge_compute_delta_squares/, "int16_t *d, const int16_t *a, const int16_t *b, int N";
   specialize qw/av1_wedge_compute_delta_squares sse2/;
 
+  # hash
+  add_proto qw/uint32_t av1_get_crc_value/, "void *crc_calculator, uint8_t *p, int length";
+  specialize qw/av1_get_crc_value sse4_2/;
+
 }
 # end encoder functions