daala_tx: New flattened 4-point Type-IV DST.

This change slightly improves the 16-point DCT round trip accuracy due
 to changes in the rounding.

subset-1:

new_dst2@2017-12-04T01:59:57.412Z -> new_dst4@2017-12-04T06:31:41.096Z

  PSNR | PSNR Cb | PSNR Cr | PSNR HVS |   SSIM | MS SSIM | CIEDE 2000
0.0078 | -0.0001 |  0.0198 |   0.0432 | 0.0408 |  0.0502 |    -0.0057

Change-Id: I75783ace97834af89e70c9ce3002c6f09176e343
diff --git a/av1/common/daala_tx.c b/av1/common/daala_tx.c
index f918621..a66e404 100644
--- a/av1/common/daala_tx.c
+++ b/av1/common/daala_tx.c
@@ -415,12 +415,12 @@
     q3 -= q2; \
     q0 += OD_RSHIFT1(q1); \
     q1 -= q0; \
-    t_ = (q2 - q1 + 1) >> 1; \
+    t_ = (q1 + q2 + 1) >> 1; \
     /* 11585/8192 ~= 2*Sin[Pi/4] ~= 1.4142135623730951 */ \
-    q2 = (11585*q1 + 4096) >> 13; \
+    q1 = (11585*q2 + 4096) >> 13; \
     /* 11585/8192 ~= 2*Cos[Pi/4] ~= 1.4142135623730951 */ \
-    q1 = (11585*t_ + 4096) >> 13; \
-    q2 += q1; \
+    q2 = (11585*t_ + 4096) >> 13; \
+    q1 -= q2; \
   } \
   while (0)