Port SIMD optimization for obmc blending functions to av1

SIMD optimization for 1d blending functions in obmc mode, and some
code refactoring and cleanup.

(ped_1080p25.y4m, 150 frame, 2000 tb)
Encoding time overhead: +18.8% -> +18.1%
Decoding time overhead: +21.3% -> +8.7%
Change-Id: I9d856c32136e7e0e6e24ab5520ef901d7b1ee9c8
19 files changed