Modify final refinement step in disflow

Instead of refining the flow field at pyramid level 0, skip this
step and instead refine the generated correspondences after
interpolation.

In theory, refining the generated correspondences is a bit more
accurate, as this can compensate for inaccuracies in the
interpolation step. In practice, we see only a small difference,
likely due to the number of other processing and filtering stages
between this and the eventual global motion parameter selection.

The main benefit is in encode time - pyramid level 0 has by far
the most flow vectors to refine, with one per 64 pixels of the source
image. So replacing this with one refine call per feature point, of
which there are generally many fewer than one per 64 pixels, leads to
a significant reduction in the runtime of the disflow step. This is
especially true at larger resolutions.

An alternative was also tested, where we refined both the dense
flow field and the generated correspondences, but this erased all
of the encode time savings here for essentially no further BDRATE
gain.

 Speed | BDRATE-PSNR | BDRATE-SSIM |   Enc time
-------+-------------+-------------+-------------
   1   |   -0.010%   |   -0.015%   |   -0.075%
   2   |   -0.001%   |   -0.018%   |   -0.189%
   3   |    0.000%   |   +0.035%   |   -0.466%
   4   |   -0.021%   |   -0.011%   |   -0.689%

STATS_CHANGED

Change-Id: If04c5cac6114e188462360a110e2d8378bda1a7f
diff --git a/aom_dsp/flow_estimation/disflow.c b/aom_dsp/flow_estimation/disflow.c
index a010c81..855a44f 100644
--- a/aom_dsp/flow_estimation/disflow.c
+++ b/aom_dsp/flow_estimation/disflow.c
@@ -96,7 +96,9 @@
   return get_cubic_value_dbl(tmp, v_kernel);
 }
 
-static int determine_disflow_correspondence(CornerList *corners,
+static int determine_disflow_correspondence(const ImagePyramid *src_pyr,
+                                            const ImagePyramid *ref_pyr,
+                                            CornerList *corners,
                                             const FlowField *flow,
                                             Correspondence *correspondences) {
   const int width = flow->width;
@@ -134,10 +136,18 @@
     get_cubic_kernel_dbl(flow_sub_x, h_kernel);
     get_cubic_kernel_dbl(flow_sub_y, v_kernel);
 
-    const double flow_u = bicubic_interp_one(&flow->u[flow_y * stride + flow_x],
-                                             stride, h_kernel, v_kernel);
-    const double flow_v = bicubic_interp_one(&flow->v[flow_y * stride + flow_x],
-                                             stride, h_kernel, v_kernel);
+    double flow_u = bicubic_interp_one(&flow->u[flow_y * stride + flow_x],
+                                       stride, h_kernel, v_kernel);
+    double flow_v = bicubic_interp_one(&flow->v[flow_y * stride + flow_x],
+                                       stride, h_kernel, v_kernel);
+
+    // Refine the interpolated flow vector one last time
+    const int patch_tl_x = x0 - DISFLOW_PATCH_CENTER;
+    const int patch_tl_y = y0 - DISFLOW_PATCH_CENTER;
+    aom_compute_flow_at_point(
+        src_pyr->layers[0].buffer, ref_pyr->layers[0].buffer, patch_tl_x,
+        patch_tl_y, src_pyr->layers[0].width, src_pyr->layers[0].height,
+        src_pyr->layers[0].stride, &flow_u, &flow_v);
 
     // Use original points (without offsets) when filling in correspondence
     // array
@@ -469,7 +479,14 @@
   }
 
   // Compute flow field from coarsest to finest level of the pyramid
-  for (int level = src_pyr->n_levels - 1; level >= 0; --level) {
+  //
+  // Note: We stop after refining pyramid level 1 and interpolating it to
+  // generate an initial flow field at level 0. We do *not* refine the dense
+  // flow field at level 0. Instead, we wait until we have generated
+  // correspondences by interpolating this flow field, and then refine the
+  // correspondences themselves. This is both faster and gives better output
+  // compared to refining the flow field at level 0 and then interpolating.
+  for (int level = src_pyr->n_levels - 1; level >= 1; --level) {
     const PyramidLayer *cur_layer = &src_pyr->layers[level];
     const int cur_width = cur_layer->width;
     const int cur_height = cur_layer->height;
@@ -657,8 +674,8 @@
     return false;
   }
 
-  const int num_correspondences =
-      determine_disflow_correspondence(src_corners, flow, correspondences);
+  const int num_correspondences = determine_disflow_correspondence(
+      src_pyramid, ref_pyramid, src_corners, flow, correspondences);
 
   bool result = ransac(correspondences, num_correspondences, type,
                        motion_models, num_motion_models, mem_alloc_failed);