Fix two bugs in highbitdepth self-guided filter

This filter was temporarily removed due to test failures.
This patch reintroduces the filter and fixes two bugs:

* The test cases would occasionally segfault on x86, since
  the highbd filter requires its inputs to be aligned to
  16 bytes. This will always be true when used on real videos,
  so adjust the test cases to match.

* The function calc_block was incorrect for bit_depth > 8,
  due to passing an incorrect argument to _mm_srl_epi32().
  This was the cause of the original test failures.


Change-Id: Ia06b76c3e6122eebadd0995fb62f32c2fcab8b3e
3 files changed