Combine fdct8x8 and quantization process

This commit reworks the forward transform and quantization process
for 8x8 block coding. It combines the two operations in a single
function to save a store/load stage of the original transform
coefficients. Overall the speed -6 is slightly faster (around 1%
range). The compression performance of speed -6 is improved by
3.4%.

Change-Id: Id6628daef123f3e4649248735ec2ad7423629387
4 files changed