Fix bug in loop restoration costing

Loop restoration parameters are delta-coded against previous
parameter sets within the same tile, but are independent of
any parameter sets from other tiles. However, the LR search
code did not take this into account, resulting in incorrect
rate estimation when using more than 1 tile.

Fix this by iterating over LR units in encoding order, rather
than raster order, and properly resetting the reference params
at the start of each tile.

1 tile: No change to encode time or output

4 tiles (2x2): Encode time neutral, BDRATE impact:

 Speed | lowres2 | midres2 |  hdres2
-------+---------+---------+---------
   1   | -0.037% | -0.018% | +0.009%
   2   | -0.033% | +0.001% | -0.013%
   3   | +0.024% | -0.022% | -0.011%
   4   | -0.024% | +0.013% | -0.045%
   5   | -0.021% | +0.004% | -0.026%
   6   | -0.011% | -0.031% | +0.010%

STATS_CHANGED

Change-Id: Ie7e8c7a33b219b132c0654d631b403c296d64987
diff --git a/av1/encoder/bitstream.c b/av1/encoder/bitstream.c
index a52bf8f..b20379e 100644
--- a/av1/encoder/bitstream.c
+++ b/av1/encoder/bitstream.c
@@ -1622,6 +1622,10 @@
   const int num_planes = av1_num_planes(cm);
   for (int plane = 0; plane < num_planes; ++plane) {
     int rcol0, rcol1, rrow0, rrow1;
+
+    // Skip some unnecessary work if loop restoration is disabled
+    if (cm->rst_info[plane].frame_restoration_type == RESTORE_NONE) continue;
+
     if (av1_loop_restoration_corners_in_sb(cm, plane, mi_row, mi_col, bsize,
                                            &rcol0, &rcol1, &rrow0, &rrow1)) {
       const int rstride = cm->rst_info[plane].horz_units;