Improve vp8_sad16x16_sse3 function

In real-time mode, vp8_sad16x16 function is called heavily in
motion search part. Improvement of this function gives 1.2%
encoding performance gain (real-time mode, tulip clip).

Change-Id: I23c401fc40c061f732a9767e8d383737a179bd58
1 file changed