Merge "HBD inverse HT 4x4 SSE4.1 optimization" into nextgenv2