Merge "HBD inverse HT 8x8 and 16x16 sse4.1 optimization" into nextgenv2