Rework perform_coeff_opt speed feature

Reworks speed feature to skip coefficient
optimization based on mse/qstep^2 rather than
mse alone.

Seems to provide 5-15% speed-up for complex videos
at speeds 1 and 2.
BDRATE (hdres2, 17 frames, q mode): shows a slight improvement.

The thresholds need to be adjusted somewhat for optimal
performance.

STATS_CHANGED

Change-Id: I67133f980b6201d42546339b45ff4910541ebab0
diff --git a/av1/encoder/speed_features.c b/av1/encoder/speed_features.c
index 0eb0241..6f85c8a 100644
--- a/av1/encoder/speed_features.c
+++ b/av1/encoder/speed_features.c
@@ -64,7 +64,7 @@
                                                                  { 2, 2, 0 } };
 
 // Threshold values to be used for disabling coeff RD-optimization
-// based on block MSE
+// based on block MSE / qstep^2.
 // TODO(any): Experiment the threshold logic based on variance metric
 // Index 0: Default mode evaluation, Winner mode processing is not applicable
 // (Eg : IntraBc) Index 1: Mode evaluation. Index 2: Winner mode evaluation.
@@ -72,10 +72,10 @@
 // feature is ON
 static unsigned int coeff_opt_dist_thresholds[5][MODE_EVAL_TYPES] = {
   { UINT_MAX, UINT_MAX, UINT_MAX },
-  { 442413, 36314, UINT_MAX },
-  { 162754, 36314, UINT_MAX },
-  { 22026, 22026, UINT_MAX },
-  { 22026, 22026, UINT_MAX }
+  { 1728, 142, UINT_MAX },
+  { 864, 142, UINT_MAX },
+  { 432, 86, UINT_MAX },
+  { 216, 86, UINT_MAX }
 };
 
 // Transform size to be used for default, mode and winner mode evaluation
diff --git a/av1/encoder/tx_search.c b/av1/encoder/tx_search.c
index 5f905c9..21560a8 100644
--- a/av1/encoder/tx_search.c
+++ b/av1/encoder/tx_search.c
@@ -2162,11 +2162,18 @@
   }
   block_sse *= 16;
 
-  // Used mse based threshold logic to take decision of R-D of optimization of
-  // coeffs. For smaller residuals, coeff optimization would be helpful. For
-  // larger residuals, R-D optimization may not be effective.
+  const int dequant_shift = (is_cur_buf_hbd(xd)) ? xd->bd - 5 : 3;
+  const int qstep = x->plane[plane].dequant_QTX[1] >> dequant_shift;
+
+  // Use mse / qstep^2 based threshold logic to take decision of R-D
+  // optimization of coeffs. For smaller residuals, coeff optimization
+  // would be helpful. For larger residuals, R-D optimization may not be
+  // effective.
   // TODO(any): Experiment with variance and mean based thresholds
-  perform_block_coeff_opt = (block_mse_q8 <= x->coeff_opt_dist_threshold);
+  perform_block_coeff_opt =
+      ((uint64_t)block_mse_q8 <=
+       (uint64_t)x->coeff_opt_dist_threshold * qstep * qstep);
+
   skip_trellis |= !perform_block_coeff_opt;
 
   // Tranform domain distortion is accurate for higher residuals.