Fix buffer overrun in dist_wtd_convolve_2d_horiz_neon When introducing the 6-tap specialization for dist_wtd_convolve_2d_vert_neon[1], we attempted to reduce the number of rows processed in the horizontal convolution (by 2) if the subsequent vertical convolution would be using a 6-tap filter instead of an 8-tap filter. This logic was faulty and meant we ended up accessing memory outside of the (padded) source buffer - since the horizontal convolution processes 4 rows of data per iteration, when the number of rows to be processed is not necessarily a multiple of 4. This patch restores the previous logic and src pointer starting position for dist_wtd_convolve_2d_horiz_neon. [1] Commit hash: be1d80024928684b1c9eebc648ed92d8ea70d166 Bug: aomedia:3367 Change-Id: Id11d4ae7c123957d3336b61f4d4cc8da09131b68 (cherry picked from commit 81da208fb6e0a1fa6a3714a7b7dbe3f613e8bc64)
diff --git a/av1/common/arm/jnt_convolve_neon.c b/av1/common/arm/jnt_convolve_neon.c index 700dc54..36c8f9c 100644 --- a/av1/common/arm/jnt_convolve_neon.c +++ b/av1/common/arm/jnt_convolve_neon.c
@@ -1163,9 +1163,9 @@ const int y_filter_taps = get_filter_tap(filter_params_y, subpel_y_qn); const int clamped_y_taps = y_filter_taps < 6 ? 6 : y_filter_taps; - const int im_h = h + clamped_y_taps - 1; + const int im_h = h + filter_params_y->taps - 1; const int im_stride = MAX_SB_SIZE; - const int vert_offset = clamped_y_taps / 2 - 1; + const int vert_offset = filter_params_y->taps / 2 - 1; const int horiz_offset = filter_params_x->taps / 2 - 1; const int round_0 = conv_params->round_0 - 1; const uint8_t *src_ptr = src - vert_offset * src_stride - horiz_offset; @@ -1182,9 +1182,10 @@ dist_wtd_convolve_2d_horiz_neon(src_ptr, src_stride, im_block, im_stride, x_filter, im_h, w, round_0); - if (clamped_y_taps <= 6) { - dist_wtd_convolve_2d_vert_6tap_neon(im_block, im_stride, dst8, dst8_stride, - conv_params, y_filter, h, w); + if (clamped_y_taps == 6) { + dist_wtd_convolve_2d_vert_6tap_neon(im_block + im_stride, im_stride, dst8, + dst8_stride, conv_params, y_filter, h, + w); } else { dist_wtd_convolve_2d_vert_8tap_neon(im_block, im_stride, dst8, dst8_stride, conv_params, y_filter, h, w);