aomedia /
avm /
977dccd12cce31a318240dbb088e43b3f2648519 Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode
- Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1
intrinsics optimization.
- Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(),
and fdct4x4_sse4_1().
- Used logic right shift to avoid coeff memory write/read.
- Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only.
- Improved overall encoding performance >2.3% for 50 frames
sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12,
--bit-depth=12, 50 frames.
- Unit test passed.
Change-Id: Idd6dc6e472cbbf235f0ade4f66fbe859a860a004
4 files changed