Fix buffer overrun in dist_wtd_convolve_2d_horiz_neon
When introducing the 6-tap specialization for
dist_wtd_convolve_2d_vert_neon[1], we attempted to reduce the number
of rows processed in the horizontal convolution (by 2) if the
subsequent vertical convolution would be using a 6-tap filter instead
of an 8-tap filter. This logic was faulty and meant we ended up
accessing memory outside of the (padded) source buffer - since the
horizontal convolution processes 4 rows of data per iteration, when
the number of rows to be processed is not necessarily a multiple of
4.
This patch restores the previous logic and src pointer starting
position for dist_wtd_convolve_2d_horiz_neon.
[1] Commit hash: be1d80024928684b1c9eebc648ed92d8ea70d166
Bug: aomedia:3367
Change-Id: Id11d4ae7c123957d3336b61f4d4cc8da09131b68
diff --git a/av1/common/arm/jnt_convolve_neon.c b/av1/common/arm/jnt_convolve_neon.c
index 700dc54..36c8f9c 100644
--- a/av1/common/arm/jnt_convolve_neon.c
+++ b/av1/common/arm/jnt_convolve_neon.c
@@ -1163,9 +1163,9 @@
const int y_filter_taps = get_filter_tap(filter_params_y, subpel_y_qn);
const int clamped_y_taps = y_filter_taps < 6 ? 6 : y_filter_taps;
- const int im_h = h + clamped_y_taps - 1;
+ const int im_h = h + filter_params_y->taps - 1;
const int im_stride = MAX_SB_SIZE;
- const int vert_offset = clamped_y_taps / 2 - 1;
+ const int vert_offset = filter_params_y->taps / 2 - 1;
const int horiz_offset = filter_params_x->taps / 2 - 1;
const int round_0 = conv_params->round_0 - 1;
const uint8_t *src_ptr = src - vert_offset * src_stride - horiz_offset;
@@ -1182,9 +1182,10 @@
dist_wtd_convolve_2d_horiz_neon(src_ptr, src_stride, im_block, im_stride,
x_filter, im_h, w, round_0);
- if (clamped_y_taps <= 6) {
- dist_wtd_convolve_2d_vert_6tap_neon(im_block, im_stride, dst8, dst8_stride,
- conv_params, y_filter, h, w);
+ if (clamped_y_taps == 6) {
+ dist_wtd_convolve_2d_vert_6tap_neon(im_block + im_stride, im_stride, dst8,
+ dst8_stride, conv_params, y_filter, h,
+ w);
} else {
dist_wtd_convolve_2d_vert_8tap_neon(im_block, im_stride, dst8, dst8_stride,
conv_params, y_filter, h, w);