)]}'
{
  "commit": "faa8b1669c59cd927374aa53bd4a127e9be1b4a3",
  "tree": "b4d98f56b0baf3921cd47ba447022c486c94adb0",
  "parents": [
    "452bb4c07a4f98630396e74cd886ef8d381f9466"
  ],
  "author": {
    "name": "Jerome Jiang",
    "email": "jianj@google.com",
    "time": "Wed May 27 12:52:36 2026 -0400"
  },
  "committer": {
    "name": "Jerome Jiang",
    "email": "jianj@google.com",
    "time": "Tue Jun 02 11:07:09 2026 -0700"
  },
  "message": "Conv horiz: expand hwy avx512\n\nFurther optimize for small blocks.\nBut fall back to avx2 for blocks with height 32.\n\n- Use 8-bit pairwise multiply-accumulate (SatWidenMulPairwiseAdd)\n  instead of 16-bit math for w \u003c\u003d 32 with even coefficients.\n- Halve filter coefficients to fit in int8_t and avoid overflow,\n  adjusting final scaling shift to FILTER_BITS - 1.\n- Eliminate expensive 8-bit to 16-bit pixel promotion.\n- Add specialized unrolled loops for w \u003d 4, 8, 16, and 32.\n\nAll blocks now show significant speed up except for small\nslow downs for block 16x32, 32x32 and 64x32\n\nI\u0027ll further investigate these block sizes.\n\nSize       | avx2      | avx512 (diff)\n------------------------------------------\n4x4        |    5.62µs |    4.03µs (-28.3%)\n4x8        |    6.78µs |    5.17µs (-23.7%)\n8x4        |    5.94µs |    4.03µs (-32.2%)\n8x8        |    6.75µs |    5.17µs (-23.4%)\n8x16       |   10.01µs |    7.66µs (-23.4%)\n16x8       |    7.28µs |    6.49µs (-10.8%)\n16x16      |   10.92µs |    10.47µs (-4.1%)\n16x32      |   17.94µs |   19.83µs (+10.5%)\n32x16      |   19.34µs |    19.59µs (+1.3%)\n32x32      |   33.67µs |   38.31µs (+13.8%)\n32x64      |  170.90µs |  153.10µs (-10.4%)\n64x32      |   68.21µs |   76.28µs (+11.8%)\n64x64      |  307.20µs |  151.80µs (-50.6%)\n64x128     |  677.800s |  305.30µs (-55.0%)\n128x64     |  527.90µs |  298.60µs (-43.4%)\n128x128    |    1.35ms |  593.90µs (-56.1%)\n\nChange-Id: I4134a9ca0e233855761f6b03c5f35e8fcf8e25fa\n",
  "tree_diff": [
    {
      "type": "modify",
      "old_id": "e5be37ec9f02352901b2326428fb73b05f884306",
      "old_mode": 33188,
      "old_path": "aom_dsp/convolve_hwy.h",
      "new_id": "0b2853103806c3dd0aae5259fb3d6c5bff953b97",
      "new_mode": 33188,
      "new_path": "aom_dsp/convolve_hwy.h"
    },
    {
      "type": "modify",
      "old_id": "c1aa90492fb2ed45c42415d7c71f9ae712cded8b",
      "old_mode": 33188,
      "old_path": "aom_dsp/x86/convolve_hwy_avx512.cc",
      "new_id": "62557047caae634edcb57f05a84c19013a26c122",
      "new_mode": 33188,
      "new_path": "aom_dsp/x86/convolve_hwy_avx512.cc"
    }
  ]
}
