Speed up av1_optimize_b

1. Remove calling of get_txb_ctx from av1_optimize_b
 to it's caller, outside of rdo loops.
2. Remove calling of av1_get_tx_type, tx_type can be
 passed in by it's caller.
3. For encoder, about 1.3% faster shows by encoding
20 frame of BasketballDrill_832x480_50.y4m, with no
coding loss.  ( 601278 ms -> 592634 ms)

a) gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
b) CPU: Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
c) Config cmd
cmake ../ -DENABLE_CCACHE=1 -DCONFIG_LOWBITDEPTH=1
d) Test cmd:
./aomenc --cpu-used=1 --end-usage=vbr \
--target-bitrate=800 --limit=20

Change-Id: I755b337e29316f4ceed37c9b669aebb4ad2d5fac
diff --git a/av1/encoder/encodemb.h b/av1/encoder/encodemb.h
index 1be2ce0..673f87e 100644
--- a/av1/encoder/encodemb.h
+++ b/av1/encoder/encodemb.h
@@ -15,6 +15,7 @@
 #include "config/aom_config.h"
 
 #include "av1/common/onyxc_int.h"
+#include "av1/common/txb_common.h"
 #include "av1/encoder/block.h"
 #include "av1/encoder/tokenize.h"
 #ifdef __cplusplus
@@ -53,9 +54,8 @@
                      AV1_XFORM_QUANT xform_quant_idx);
 
 int av1_optimize_b(const struct AV1_COMP *cpi, MACROBLOCK *mb, int plane,
-                   int blk_row, int blk_col, int block, BLOCK_SIZE plane_bsize,
-                   TX_SIZE tx_size, const ENTROPY_CONTEXT *a,
-                   const ENTROPY_CONTEXT *l, int fast_mode, int *rate_cost);
+                   int block, TX_SIZE tx_size, TX_TYPE tx_type,
+                   const TXB_CTX *const txb_ctx, int fast_mode, int *rate_cost);
 
 void av1_subtract_txb(MACROBLOCK *x, int plane, BLOCK_SIZE plane_bsize,
                       int blk_col, int blk_row, TX_SIZE tx_size);