Speed up  tm_predictor_4x4

tm_predictor_4x4 is implemented with SSE2 using XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.

Change-Id: I25074b78d476a2cb17f81cf654bdfd80df2070e0
3 files changed