Fix bottom ref condition for intra prediction in cb4x4

Resolve a rare enc/dec mismatch issue triggered by the 4x4 chroma
block size in sub8x8 block coding.

To avoid using 2x2 chroma block size, it always uses the top-left
4x4 luma block coding information to predict the 4x4 chroma block.
The rest 3 4x4 luma blocks will be coded independently without
coding the collocated chroma blocks.

The compression performance gains are 1.45% for lowres set (down
from 2.4% of the original cb4x4 design).

Change-Id: Iff560fcab172ed4219434d5174c4d8dfbbb04135
diff --git a/av1/common/reconintra.c b/av1/common/reconintra.c
index 4f99a98..367efdd 100644
--- a/av1/common/reconintra.c
+++ b/av1/common/reconintra.c
@@ -1895,14 +1895,22 @@
   const int mi_col = -xd->mb_to_left_edge >> (3 + MI_SIZE_LOG2);
   const int txwpx = tx_size_wide[tx_size];
   const int txhpx = tx_size_high[tx_size];
+#if CONFIG_CB4X4
+  const int xr_chr_offset = (plane && bsize < BLOCK_8X8) ? 2 : 0;
+  const int yd_chr_offset = (plane && bsize < BLOCK_8X8) ? 2 : 0;
+#else
+  const int xr_chr_offset = 0;
+  const int yd_chr_offset = 0;
+#endif
+
   // Distance between the right edge of this prediction block to
   // the frame right edge
-  const int xr =
-      (xd->mb_to_right_edge >> (3 + pd->subsampling_x)) + (wpx - x - txwpx);
+  const int xr = (xd->mb_to_right_edge >> (3 + pd->subsampling_x)) +
+                 (wpx - x - txwpx) - xr_chr_offset;
   // Distance between the bottom edge of this prediction block to
   // the frame bottom edge
-  const int yd =
-      (xd->mb_to_bottom_edge >> (3 + pd->subsampling_y)) + (hpx - y - txhpx);
+  const int yd = (xd->mb_to_bottom_edge >> (3 + pd->subsampling_y)) +
+                 (hpx - y - txhpx) - yd_chr_offset;
   const int right_available =
       (mi_col + ((col_off + txw) >> (1 - pd->subsampling_x))) <
       xd->tile.mi_col_end;