Implement SSE version for sad4x8x4d and SSE2 version for sad8x4x4d.

Encoding time of crew (CIF, first 50 frames) @ 1500kbps goes from 4min56
to 4min42.

Change-Id: I92c0c8b32980d2ae7c6dafc8b883a2c7fcd14a9f
3 files changed