Merge "Improve sad3x16 SSE2 function" into experimental