Enable sse2 version of sad8x4/4x8

The encoding time for bus at CIF goes from 661s to 625s. This commit
also enabled unit test of sad8x4/4x8 in sad_test.cc.

Change-Id: If3d10ebb56bda584bdb69bcf056599d580b12cb1
3 files changed