Add ML partition search breakout in rd_pick_sqr_partition

This is the same feature used in rd_pick_partition after trying
PARTITION_NONE, and only implemented models for small resolutions
(less than 720P).

AWCY objective-1-fast(360P) result of Speed 1: -0.09

Tested BasketballDrill_832x480_50 (20 frames, bitrate=800, Speed 1),
it is 3.5% faster.

STATS_CHANGED

Change-Id: Id6c133ecb0f07e96490d5b06ef5c3602aa4b6483
diff --git a/av1/encoder/encodeframe.c b/av1/encoder/encodeframe.c
index f1e855e..4e07ff2 100644
--- a/av1/encoder/encodeframe.c
+++ b/av1/encoder/encodeframe.c
@@ -64,6 +64,10 @@
                               ThreadData *td, TOKENEXTRA **t, RUN_TYPE dry_run,
                               int mi_row, int mi_col, BLOCK_SIZE bsize,
                               int *rate);
+static int ml_predict_breakout(const AV1_COMP *const cpi, BLOCK_SIZE bsize,
+                               const MACROBLOCK *const x,
+                               const RD_STATS *const rd_stats,
+                               unsigned int pb_source_variance);
 
 // This is used as a reference when computing the source variance for the
 //  purposes of activity masking.
@@ -2516,15 +2520,30 @@
 
         pc_tree->cb_search_range = SEARCH_FULL_PLANE;
 
-        // If all y, u, v transform blocks in this partition are skippable, and
-        // the dist & rate are within the thresholds, the partition search is
-        // terminated for current branch of the partition search tree.
-        // The dist & rate thresholds are set to 0 at speed 0 to disable the
-        // early termination at that speed.
-        if (!x->e_mbd.lossless[xd->mi[0]->segment_id] &&
-            (ctx_none->skippable && best_rdc.dist < dist_breakout_thr &&
-             best_rdc.rate < rate_breakout_thr)) {
-          do_square_split = 0;
+        if (!x->e_mbd.lossless[xd->mi[0]->segment_id] && ctx_none->skippable) {
+          const int use_ml_based_breakout =
+              bsize <= cpi->sf.use_square_partition_only_threshold &&
+              bsize > BLOCK_4X4 && xd->bd == 8;
+
+          // TODO(anyone): Currently this is using the same model and threshold
+          // values as in rd_pick_partition. Retraining the model and tuning the
+          // threshold values might be helpful to improve the speed.
+          if (use_ml_based_breakout) {
+            if (ml_predict_breakout(cpi, bsize, x, &this_rdc,
+                                    x->source_variance)) {
+              do_square_split = 0;
+            }
+          }
+
+          // If all y, u, v transform blocks in this partition are skippable,
+          // and the dist & rate are within the thresholds, the partition search
+          // is terminated for current branch of the partition search tree. The
+          // dist & rate thresholds are set to 0 at speed 0 to disable the early
+          // termination at that speed.
+          if (best_rdc.dist < dist_breakout_thr &&
+              best_rdc.rate < rate_breakout_thr) {
+            do_square_split = 0;
+          }
         }
       }
     }