CLPF: Replace v128_shr_n_s8 with v128_cmplt_s8 for sign extraction

On x86 there is no direct v128_shr_n_s8 equivalent, so
v128_cmplt_s8(a, v128_zero()) is much better than v128_shr_n_s8(a, 7).
Should have no impact on NEON.

Also replace v256_from_v128(v128_from_v64(a, b), v128_from_v64(c, d))
with v256_from_v64(a, b, c, d).

Change-Id: I711e3cb250689089d7b5336a294e9d6bdd998445
1 file changed