Merge "Zero high 128b YMM registers to avoid SSE-AVX transition penalties" into nextgenv2