Superblock qp sweep experiment given fixed rdmult

This commit is to sweep multiple sb qindexes for a
given rdmult(lambda) to optimize the rdcost. The
code changes are controled by a new encoder argument
"sb_qp_sweep", and is disabled by default now.
A unit test is added to ensure end-to-end encoding
and decoding run well.

Borg testing results are provided as below, with
roughly ~4x slower speed. The bdrate gain indicated
some headrooms for better lambda to qp mappings.

         	avg_psnr ovr_psnr ssim   vmaf
speed=1	lowres	-1.031	-0.986	-1.231	-1.029
	midres2	-1.288	-1.217	-1.424	-1.217
	ugc480p	-0.888	-0.864	-1.15	-0.968

speed=2	lowres	-1.049	-1.005	-1.327	-1.399
	midres2	-1.274	-1.207	-1.446	-1.253
	ugc480p	-0.95	-0.908	-1.172	-0.851

speed=3	lowres	-1.238	-1.193	-1.482	-1.254
	midres2	-1.48	-1.401	-1.562	-1.219
	ugc480p	-1.026	-0.959	-1.315	-1.063

speed=4	lowres	-1.535	-1.479	-1.812	-1.928
	midres2	-1.628	-1.584	-1.905	-1.662
	ugc480p	-1.193	-1.079	-1.585	-1.269

speed=5	lowres	-1.95	-1.874	-2.158	-2.003
	midres2	-2.18	-2.098	-2.434	-1.856
	ugc480p	-1.414	-1.304	-1.842	-1.469

Change-Id: I3dae74c7ac0d557fac498cb7f0cc9ae0d23be29e
diff --git a/av1/arg_defs.c b/av1/arg_defs.c
index acda4f8..abfd4b3 100644
--- a/av1/arg_defs.c
+++ b/av1/arg_defs.c
@@ -679,5 +679,9 @@
       "key frame (-1 to 5). When set to -1 (default), it does not have any "
       "effect. The actual maximum pyramid height will be the minimum of this "
       "value and the value of gf_max_pyr_height."),
+  .sb_qp_sweep =
+      ARG_DEF(NULL, "sb-qp-sweep", 1,
+              "When set to 1, enable the superblock level qp sweep for a "
+              "given lambda to minimize the rdcost."),
 #endif  // CONFIG_AV1_ENCODER
 };