Don't work in TX64X64 blocks on a subsampled chroma plane

With ext-partition and tx64x64 enabled, the encoder could choose
TX_64X64 to encode the transform for a subsampled plane of a
BLOCK_128X128 block. This broke an assumption in the nested loop in
write_tokens_b and also caused bug 827 (with a rather cryptic
failure).

This patch changes get_vartx_max_txsize to ensure that the encoder and
decoder don't think they can use TX_64X64 in this situation. It also
adds a couple of assertions to the loop mentioned above so that if
something comes unstuck it'll be much more obvious what went wrong.

BUG=aomedia:827

Change-Id: Ie093f2f20f6242949d68e950c8f95b100867ee17
diff --git a/av1/encoder/bitstream.c b/av1/encoder/bitstream.c
index 9401f1e..00961e9 100644
--- a/av1/encoder/bitstream.c
+++ b/av1/encoder/bitstream.c
@@ -1761,7 +1761,7 @@
       !(is_inter && skip) && !xd->lossless[segment_id]) {
 #if CONFIG_VAR_TX
     if (is_inter) {  // This implies skip flag is 0.
-      const TX_SIZE max_tx_size = get_vartx_max_txsize(mbmi, bsize);
+      const TX_SIZE max_tx_size = get_vartx_max_txsize(mbmi, bsize, 0);
       const int bh = tx_size_high_unit[max_tx_size];
       const int bw = tx_size_wide_unit[max_tx_size];
       const int width = block_size_wide[bsize] >> tx_size_wide_log2[0];
@@ -2630,12 +2630,15 @@
       mu_blocks_high = AOMMIN(num_4x4_h, mu_blocks_high);
 
       if (is_inter_block(mbmi)) {
-        const TX_SIZE max_tx_size = get_vartx_max_txsize(mbmi, plane_bsize);
+        const TX_SIZE max_tx_size = get_vartx_max_txsize(
+            mbmi, plane_bsize, pd->subsampling_x || pd->subsampling_y);
         int block = 0;
         const int step =
             tx_size_wide_unit[max_tx_size] * tx_size_high_unit[max_tx_size];
         const int bkw = tx_size_wide_unit[max_tx_size];
         const int bkh = tx_size_high_unit[max_tx_size];
+        assert(bkw <= mu_blocks_wide);
+        assert(bkh <= mu_blocks_high);
         for (row = 0; row < num_4x4_h; row += mu_blocks_high) {
           const int unit_height = AOMMIN(mu_blocks_high + row, num_4x4_h);
           for (col = 0; col < num_4x4_w; col += mu_blocks_wide) {