Added AVX2 variant for av1_warp_affine_sse4_1

For speed = 1, 2, 3 and 4 presets observed encode time reduction of
0.8%, 0.95%, 0.2% and 0.2% (averaged across multiple test cases).

Module gains improved by factor of ~1.4x w.r.t SSE4_1 module

Change-Id: I5007f2a96347844e969a5cdcf9c7e3a2928bf8f3
diff --git a/av1/common/warped_motion.h b/av1/common/warped_motion.h
index a1a4f06..39ee388 100644
--- a/av1/common/warped_motion.h
+++ b/av1/common/warped_motion.h
@@ -34,6 +34,9 @@
 
 extern const int16_t warped_filter[WARPEDPIXEL_PREC_SHIFTS * 3 + 1][8];
 
+DECLARE_ALIGNED(8, extern const int8_t,
+                filter_8bit[WARPEDPIXEL_PREC_SHIFTS * 3 + 1][8]);
+
 static const uint8_t warp_pad_left[14][16] = {
   { 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
   { 2, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },