Improve temporal filtering with cross-plane info.

Currently, plane-wise temporal filtering strategy filters Y, U, V planes
independently. However, since the motion search used in temporal
filtering is only performed with Y-plane, the information from Y-plane
is more accurate than UV-plane. Accordingly, this CL filters U and V
planes by considering the information from Y-plane. This significantly
improves PSNR on U-plane and V-plane.

NOTE: Whether Y-plane should consider the information from UV-plane, and
whether U and V planes should consider the information from each other
are still worth exploring. Also, when considering the cross-plane
information, whether the weight should be equivalent as that of
self-plane information is another worth-studying problem.

Since plane-wise strategy is only applied to midres and hdres videos,
only results from these two test sets are reported.

Experimental results:

Under Speed-4 (two-pass mode):
        avg PSNR   ovr PSNR     SSIM   PSNR_Y   PSNR_U   PSNR_V
midres    -0.118     -0.088   -0.109   -0.017   -1.002   -0.783
hdres     -0.036     -0.029   -0.014    0.025   -0.520   -0.582

Under Speed-1 (two-pass mode):
        avg PSNR   ovr PSNR     SSIM   PSNR_Y   PSNR_U   PSNR_V
midres    -0.088     -0.090   -0.070   -0.004   -1.007   -0.845
hdres     -0.045     -0.055   -0.056    0.018   -0.511   -0.516

STATS_CHANGED

Change-Id: Ibf268d9a2ddc269e6282ff582582a77dcd244da7
diff --git a/av1/encoder/temporal_filter.c b/av1/encoder/temporal_filter.c
index c44c426..45561cb 100644
--- a/av1/encoder/temporal_filter.c
+++ b/av1/encoder/temporal_filter.c
@@ -870,6 +870,23 @@
           }
         }
 
+        // Filter U-plane and V-plane using Y-plane. This is because motion
+        // search is only done on Y-plane, so the information from Y-plane will
+        // be more accurate.
+        if (plane != 0) {
+          const int ss_y_shift = subsampling_y - mbd->plane[0].subsampling_y;
+          const int ss_x_shift = subsampling_x - mbd->plane[0].subsampling_x;
+          for (int ii = 0; ii < (1 << ss_y_shift); ++ii) {
+            for (int jj = 0; jj < (1 << ss_x_shift); ++jj) {
+              const int yy = (i << ss_y_shift) + ii;  // Y-coord on Y-plane.
+              const int xx = (j << ss_x_shift) + jj;  // X-coord on Y-plane.
+              const int ww = w << ss_x_shift;         // Width of Y-plane.
+              sum_square_diff += square_diff[yy * ww + xx];
+              ++num_ref_pixels;
+            }
+          }
+        }
+
         // Control factor for non-local mean approach.
         const double r =
             (double)decay_control * (0.7 + log(noise_levels[plane] + 1.0));
@@ -935,16 +952,12 @@
   assert(num_planes >= 1 && num_planes <= MAX_MB_PLANE);
 
   if (use_planewise_strategy) {  // Commonly used for high-resolution video.
-    // TODO(any): avx2 and sse version should also support high bit-depth.
-    if (is_frame_high_bitdepth(frame_to_filter)) {
-      av1_apply_temporal_filter_planewise_c(frame_to_filter, mbd, block_size,
-                                            mb_row, mb_col, num_planes,
-                                            noise_levels, pred, accum, count);
-    } else {
-      av1_apply_temporal_filter_planewise(frame_to_filter, mbd, block_size,
+    // TODO(any): avx2 and sse2 version should also support high bit-depth, and
+    // they should be changed to consider cross-plane information (see C
+    // function) before using.
+    av1_apply_temporal_filter_planewise_c(frame_to_filter, mbd, block_size,
                                           mb_row, mb_col, num_planes,
                                           noise_levels, pred, accum, count);
-    }
   } else {  // Commonly used for low-resolution video.
     const int adj_strength = strength + 2 * (mbd->bd - 8);
     if (num_planes == 1) {