sse2 intrinsic version of vp8_mbloop_filter_vertical_edge()

First sse2 version of vp8_mbloop_filter_vertical_edge().  For now,
intrinsics are being used until the bitstream is finalized.  This function
will be revisited later for further performance improvements.

For the test clip used, a 34+% decoder performance improvement
was seen.  This will vary depending on material.

Change-Id: I455b438bc8d8af76cf7533ac42eda5f689b21f7c
6 files changed