Allintra: Set superblock size as 64x64 for speed>=9

Superblock size of 128x128 may not be required during allintra
encode since maximum partition size is 32x32 for speed>=6. This
CL forces SB size to 64x64 to improve the multithread performance
as top-right delay and thread sync wastage are reduced. This CL
is applicable for speed>=9, resolutions lesser than 4K.

For AVIF still-image encode,

             Encode Time     BD-Rate Loss(%)
cpu-used     Reduction(%)    psnr       ssim
   9           1.972        -0.0135    -0.0166

For AVIF still-image encode, an average encode time reduction of
~4.7% is observed across different resolutions with speed=9 and
threads=4.

STATS_CHANGED for speed=9

Change-Id: I90ba9461f53a28bdc95c8233ca9ec5ec62472005
diff --git a/av1/encoder/encoder_utils.c b/av1/encoder/encoder_utils.c
index f15b82c..c6bff2e 100644
--- a/av1/encoder/encoder_utils.c
+++ b/av1/encoder/encoder_utils.c
@@ -826,6 +826,16 @@
     if (!is_480p_or_lesser && is_1080p_or_lesser && oxcf->mode == GOOD &&
         oxcf->row_mt == 1 && oxcf->max_threads > 1 && oxcf->speed >= 5)
       return BLOCK_64X64;
+
+    // For allintra encode, since the maximum partition size is set to 32X32 for
+    // speed>=6, superblock size is set to 64X64 instead of 128X128. This
+    // improves the multithread performance due to reduction in top right delay
+    // and thread sync wastage. Currently, this setting is selectively enabled
+    // only for speed>=9 and resolutions less than 4k since cost update
+    // frequency is set to INTERNAL_COST_UPD_OFF in these cases.
+    const int is_4k_or_larger = AOMMIN(width, height) >= 2160;
+    if (oxcf->mode == ALLINTRA && oxcf->speed >= 9 && !is_4k_or_larger)
+      return BLOCK_64X64;
   }
   return BLOCK_128X128;
 }