[CFL] Reorganize Reconstructed Pixel Buffering

Reworked how the storage flag is set to avoid duplication on the encoder
side. Reconstructed Luma pixels are stored in encode_superblock in the
loop that calls av1_encode_intra_block_plane and in the extra call to
txfm_rd_in_plane after the luma RDO, but prior to the chroma RDO.

This change does not alter the bitsteam.

Change-Id: Ifd8441363ea0733fea3d06129a025940abb2abc9
diff --git a/av1/encoder/rdopt.c b/av1/encoder/rdopt.c
index 180001a..695b4c4 100644
--- a/av1/encoder/rdopt.c
+++ b/av1/encoder/rdopt.c
@@ -9843,12 +9843,12 @@
 #else
     x->cfl_store_y = 1;
 #endif  // CONFIG_CB4X4
-
-    txfm_rd_in_plane(x, cpi, &this_rd_stats, INT64_MAX, AOM_PLANE_Y,
-                     mbmi->sb_type, mbmi->tx_size,
-                     cpi->sf.use_fast_coef_costing);
-
-    x->cfl_store_y = 0;
+    if (x->cfl_store_y) {
+      txfm_rd_in_plane(x, cpi, &this_rd_stats, INT64_MAX, AOM_PLANE_Y,
+                       mbmi->sb_type, mbmi->tx_size,
+                       cpi->sf.use_fast_coef_costing);
+      x->cfl_store_y = 0;
+    }
 #endif  // CONFIG_CFL
     max_uv_tx_size = uv_txsize_lookup[bsize][mbmi->tx_size][pd[1].subsampling_x]
                                      [pd[1].subsampling_y];