Added AVX2 variant for av1_warp_affine_sse4_1 For speed = 1, 2, 3 and 4 presets observed encode time reduction of 0.8%, 0.95%, 0.2% and 0.2% (averaged across multiple test cases). Module gains improved by factor of ~1.4x w.r.t SSE4_1 module Change-Id: I5007f2a96347844e969a5cdcf9c7e3a2928bf8f3

commit: a80c64e3dc7d287eeeb1ef8d961f3d74ec920c2d [log] [tgz]
author: Aniket Dhok <aniket.dhok@ittiam.com> Thu Apr 25 09:29:28 2019 +0530
committer: Yunqing Wang <yunqingwang@google.com> Thu May 02 23:36:01 2019 +0000
tree: 505ebf23492a78c8780e297711e0d07f926229e2
parent: 9424338d4d9388074612b909b50c72d3737ebb65 [diff] [blame]
diff --git a/av1/common/warped_motion.h b/av1/common/warped_motion.h
index a1a4f06..39ee388 100644
--- a/av1/common/warped_motion.h
+++ b/av1/common/warped_motion.h

@@ -34,6 +34,9 @@
 
 extern const int16_t warped_filter[WARPEDPIXEL_PREC_SHIFTS * 3 + 1][8];
 
+DECLARE_ALIGNED(8, extern const int8_t,
+                filter_8bit[WARPEDPIXEL_PREC_SHIFTS * 3 + 1][8]);
+
 static const uint8_t warp_pad_left[14][16] = {
   { 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
   { 2, 2, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
commit	a80c64e3dc7d287eeeb1ef8d961f3d74ec920c2d	[log] [tgz]
author	Aniket Dhok <aniket.dhok@ittiam.com>	Thu Apr 25 09:29:28 2019 +0530
committer	Yunqing Wang <yunqingwang@google.com>	Thu May 02 23:36:01 2019 +0000
tree	505ebf23492a78c8780e297711e0d07f926229e2
parent	9424338d4d9388074612b909b50c72d3737ebb65 [diff] [blame]