Add SSSE3 warp filter + const-ify warp filters

The SSSE3 filter is very similar to the SSE2 filter, but
the horizontal pass is sped up by using the 8x8->16
multiplies added in SSSE3.

Also apply const-correctness to all versions of the filter

The timings of the existing filters are unchanged, and the
lowbd SSSE3 filter is ~17% faster than the lowbd SSE2 filter.

Timings per 8x8 block:
lowbd SSE2: 320ns
lowbd SSSE3: 273ns
highbd SSSE3: 300ns

Filter output is unchanged.

Change-Id: Ifb428a33b106d900cde1b080794796c0754ae182
11 files changed