SSE2 16x16 ADST/DCT hybrid transform

This commit enables 16x16 ADST/DCT forward hybrid transform using SSE2
operations. It reduces the runtime from 5433 cycles to 1621 cycles, at
no compression performance loss.

Change-Id: I75fd7f1984e9e28846af459f810ff0d6ae125230
2 files changed