Improve SSE2 half-pixel filter funtions

Rewrote these functions to process 16 pixels once instead of 8.

Change-Id: Ic67e80124467a446a3df4cfecfb76a4248602adb
3 files changed