Add SSE3 versions for sad{32x32,64x64}x4d functions.

Overall encoding about 15% faster.

Change-Id: I176a775c704317509e32eee83739721804120ff2
2 files changed