Merge "Add SSE4.1 code for deringing functions." into nextgenv2