Add an emms instruction to aom_subtract_block

This commit adds an emms instruction at the end of MMX assembly for
aom_subtract_block to properly clear the register state.

Note that there is a slight stats difference, indicating that there is a
missing aom_clear_system_state somewhere in the codebase.

This also resolves a mismatch between x86 build and x86-64 build. See
the bug tracker for more details.

BUG=aomedia:2968

STATS_CHANGED

Change-Id: I22b0269017007a5d35ad19cd7e15a838d5ebd326
diff --git a/aom_dsp/x86/subtract_sse2.asm b/aom_dsp/x86/subtract_sse2.asm
index 1a75a23..af38022 100644
--- a/aom_dsp/x86/subtract_sse2.asm
+++ b/aom_dsp/x86/subtract_sse2.asm
@@ -143,4 +143,5 @@
   lea                predq, [predq+pred_str*2]
   sub                rowsd, 2
   jg .loop_4
+  emms
   RET