RTC: Reduce number of modes going into for-loop

Currently, only single reference prediction is used in real time
speed 5 & 6. This CL reduces the number of inter modes that go
into the for-loop to eliminate the unneeded branching operations
and initialization. This doesn't change any encoder result.
Later, if we decide to use compound modes in real time, similar
approach should be used. E.g. modify av1_default_mode_order table.
Borg results:
            avg_psnr:  ovr_psnr:  ssim:  speedup:
RT speed 5:
rtc:         0.000      0.000     0.000   2.263
rtc_derf:    0.000      0.000     0.000   2.877
RT speed 6:
rtc:         0.000      0.000     0.000   2.523
rtc_derf:    0.000      0.000     0.000   3.648

Change-Id: Ibf1938f3b7c45b140982ccfa4297500536b504df
diff --git a/av1/encoder/rdopt.c b/av1/encoder/rdopt.c
index 4db0abe..23c0eef 100644
--- a/av1/encoder/rdopt.c
+++ b/av1/encoder/rdopt.c
@@ -5480,8 +5480,18 @@
   // with av1_default_mode_order to get the enum that defines the mode, which
   // can be used with av1_mode_defs to get the prediction mode and the ref
   // frames.
-  for (THR_MODES midx = THR_INTER_MODE_START; midx < THR_INTER_MODE_END;
-       ++midx) {
+  // TODO(yunqing, any): Setting mode_start and mode_end outside for-loop brings
+  // good speedup for real time case. If we decide to use compound mode in real
+  // time, maybe we can modify av1_default_mode_order table.
+  THR_MODES mode_start = THR_INTER_MODE_START;
+  THR_MODES mode_end = THR_INTER_MODE_END;
+  const CurrentFrame *const current_frame = &cm->current_frame;
+  if (current_frame->reference_mode == SINGLE_REFERENCE) {
+    mode_start = SINGLE_REF_MODE_START;
+    mode_end = SINGLE_REF_MODE_END;
+  }
+
+  for (THR_MODES midx = mode_start; midx < mode_end; ++midx) {
     // Get the actual prediction mode we are trying in this iteration
     const THR_MODES mode_enum = av1_default_mode_order[midx];
     const MODE_DEFINITION *mode_def = &av1_mode_defs[mode_enum];