Speed up of DST and the search in ext_tx

Adds an early termination to the ext_tx search, and also
implements the DST transforms more efficiently.

About 4 times faster with the ext-tx experiment.

There is a 0.09% drop in performance on derflr from 1.735% to
1.648%, but worth it with the speedup achieved.

Change-Id: I2ede9d69c557f25e0a76cd5d701cc0e36e825c7c
6 files changed