K-means: check for centroid equality first

This check should be done asap, before the indices+dist computation.
The speed-up is rather big because we usually just need the
computation before the for loop, as the initial centroids are not
too bad (equally spaced) and one step of refinement is enough.

On rtc_screen, speed 0, 50 runs, speed-ups are in %:

File	encoding_spdup:
screen_recording_crd.1920_1080.y4m	0.087
screenshare_buganizer.1900_1306.y4m	0.168
screenshare_colorslides.1820_1320.y4m	0.246
screenshare_slidechanges.1850_1110.y4m	0.254
screenshare_youtube.1680_1178.y4m	0.235
slides_webplot.1920_1080.y4m	0.245
sc_web_browsing720p.y4m	0.331
screen_crd_colwinscroll.1920_1128.y4m	0.255
{OVERALL}	0.227

Speed 10:

File	encoding_spdup:
screen_recording_crd.1920_1080.y4m	0.243
screenshare_buganizer.1900_1306.y4m	0.362
screenshare_colorslides.1820_1320.y4m	0.340
screenshare_slidechanges.1850_1110.y4m	0.511
screenshare_youtube.1680_1178.y4m	0.262
slides_webplot.1920_1080.y4m	0.366
sc_web_browsing720p.y4m	0.418
screen_crd_colwinscroll.1920_1128.y4m	0.329
{OVERALL}	0.354

Change-Id: I87f579179feb84308846cb47fced26ac815dbc27
diff --git a/av1/encoder/k_means_template.h b/av1/encoder/k_means_template.h
index 31ffdcf..4be2038 100644
--- a/av1/encoder/k_means_template.h
+++ b/av1/encoder/k_means_template.h
@@ -123,6 +123,10 @@
     l = (l == 1) ? 0 : 1;
 
     RENAME(calc_centroids)(data, meta_centroids[l], meta_indices[prev_l], n, k);
+    if (!memcmp(meta_centroids[l], meta_centroids[prev_l],
+                sizeof(centroids[0]) * k * AV1_K_MEANS_DIM)) {
+      break;
+    }
 #if AV1_K_MEANS_DIM == 1
     av1_calc_indices_dim1(data, meta_centroids[l], meta_indices[l], &this_dist,
                           n, k);
@@ -135,9 +139,6 @@
       best_l = prev_l;
       break;
     }
-    if (!memcmp(meta_centroids[l], meta_centroids[prev_l],
-                sizeof(centroids[0]) * k * AV1_K_MEANS_DIM))
-      break;
   }
   if (i == max_itr) best_l = l;
   if (best_l != 0) {