Add SIMD code for PVQ search

This reduces the runtime profile of pvq_search_rdo_double from 37%
to 15% and improves overall encoding speed when PVQ is enabled by ~40%.
The SIMD code is not bit accurate with the C version and introduces a
slight PSNR regression on AWCY:

  PSNR | PSNR Cb | PSNR Cr | PSNR HVS | SSIM | MS SSIM | CIEDE 2000
0.0607 |  0.1044 |     N/A |   0.0126 |  N/A | -0.0309 |        N/A

Change-Id: Ie22cebc62df2e72618305f2268668d79167860c6
5 files changed