Add SSE4.1 vpx_obmc_variance* implementations and cosmetics

Speedup for these functions: 4x
Also include some cosmetic changes to SAD functions

Change-Id: I344c32c795492507ae08742f52d035a13f583799
8 files changed