Vectorize corner matching function

Add an SSE4 version of compute_cross_correlation() from
corner_match.c. This function is about 3.4x the speed of
the scalar code; determine_correspondence as a whole is about
2.5-3x the speed it was previously.

BUG=aomedia:487

Change-Id: I707b7cfd5c513c025d3ee7fb6a5f1fa335ecd495
7 files changed