Write shorter matrix_coef_delta if possible I believe this optimization is the reason why the decoder reconstructs the matrix coefficient by calculating (prev + delta + 256) % 256 rather than the more obvious prev + delta. Take advantage of that, even though delta is unlikely to be < -128 or >= 128.
diff --git a/av1/encoder/bitstream.c b/av1/encoder/bitstream.c index 3749eae..748e091 100644 --- a/av1/encoder/bitstream.c +++ b/av1/encoder/bitstream.c
@@ -6051,7 +6051,21 @@ int16_t prev = 32; for (int i = 0; i < tx_size_2d[tsize]; i++) { int16_t coeff = mat[s->scan[i]]; - aom_wb_write_svlc(wb, coeff - prev); + int16_t delta = coeff - prev; + // The decoder reconstructs the matrix coefficient by calculating + // (prev + delta + NUM_QM_VALS) % NUM_QM_VALS. Therefore delta, + // delta + NUM_QM_VALS, and delta - NUM_QM_VALS are all equivalent + // because they are equal modulo NUM_QM_VALS. If delta + NUM_QM_VALS or + // delta - NUM_QM_VALS has a smaller absolute value than delta, it is + // likely to have a shorter svlc() code, so we will write it instead. + // In other words, for each delta value, we aim to find an equivalent + // value (modulo NUM_QM_VALS) that has the shortest svlc() code. + if (delta < -(NUM_QM_VALS / 2)) { + delta += NUM_QM_VALS; + } else if (delta >= NUM_QM_VALS / 2) { + delta -= NUM_QM_VALS; + } + aom_wb_write_svlc(wb, delta); prev = coeff; } }